Markus Winand's Blog

Clustering Factor: Row Migration’s Victim

In Performance on 2010-03-09 at 10:15

This article describes the effects of a high row migration rate on the clustering factor and the optimizer’s ability to select the best execution plan.

In my previous article—Row Migration and Row Movement—I have demonstrated that the “insert empty, update everything” anti-pattern can lead to 100% row migration. This article continues the research on row migration and unveils surprising effects on the clustering factor. To be precise, the clustering factor can become completely bogus in presence of a very high row migration rate. Once the clustering factor is “wrong”, it’s just a finger exercise to construct an optimizer trap and proof that row migration can affect the query plan.

To start off, I create a table similar to the one in the previous article. However, I add one more column and populate it with random values from 0 to 9. I will need this column to build my optimizer trap later on.

CREATE TABLE row_mig1 (
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  x NUMBER      NOT NULL,
  filter NUMBER NOT NULL,
  CONSTRAINT row_mig1_pk PRIMARY KEY (x)
) ENABLE ROW MOVEMENT;

BEGIN
   FOR i IN 1..100000 LOOP
      INSERT INTO row_mig1 
                    (x, filter) 
             VALUES (i, trunc(dbms_random.value(0, 10)));

      UPDATE row_mig1 
         SET a ='a', b = 'b', c = 'c'
       WHERE x=i;
   END LOOP;
END;
/
COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1', CASCADE=>TRUE);

Up till now everything is very similar to the last article, except that I used DBMS_STATS instead of ANALYZE TABLE. Although we know exactly that almost every row was migrated, DBMS_STATS doesn’t care about that:

SQL> SELECT num_rows, chain_cnt
       FROM user_tables
      WHERE table_name='ROW_MIG1';

  NUM_ROWS  CHAIN_CNT
---------- ----------
    100000          0

SQL> 

However, let’s have a look at the index:

SQL> SELECT num_rows, leaf_blocks, clustering_factor
       FROM user_indexes
      WHERE index_name = 'ROW_MIG1_PK';

  NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
    100000         187               918

SQL> 

Well, that’s an excellent clustering factor. A low clustering factor indicates that the index is in the same sequence as the table. That’s true in that case because the index column corresponds to the order in which the rows were inserted to the table. However, 918, how can this be? The clustering factor is supposed to be between the number of table blocks—which indicates a good clustering factor—and the number of index rows—which is the worst case. So, let’s look at the table size:

SQL> SELECT t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE index_name = 'ROW_MIG1_PK';

TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
------------ ----------------------- ----------
      100877                     918     100000

SQL> 

According to that, the lower bound for the clustering factor is higher than the upper bound. Hmm, let’s investigate the clustering factor manually and verify the distribution of the rows across the table blocks:

SQL> SELECT * FROM (
       SELECT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_number
            , COUNT(*) rows_in_block
         FROM row_mig1
        GROUP BY DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid)
     ) WHERE ROWNUM <=10;


BLOCK_NUMBER ROWS_IN_BLOCK
------------ -------------
         523           109
         524           113
         525           109
         526           109
         527           110
         536           109
         537           110
         538           109
         539           109
         540           110

10 rows selected.

SQL> 

There are 109 different ROWIDs that refer to the block number 523. But we know that each record has about 6k. It is impossible to fit 109 rows into a single block of 8k. However, the “insert empty, update everything” anti-pattern makes it possible. The game goes like this:

  1. The very first row is inserted into a new block.

  2. The first row is updated, and fits into the same block. The free space in that particular block is still more than a kilobyte.

  3. The second row is inserted into the very same block, because there is enough free space available in that block.

  4. The update of the second row triggers the migration of that row to a different block.

    The row migration changes neither the ROWID nor the index entry. That means that the forwarding address—that is, somehow, the new ROWID—is stored in the original block so that the index can find the row.

  5. The third row is inserted into the very first table block, again.

    Because the second row was moved into a different block, the very first block has still free space. There is only the first row—as a whole—and one forwarding address stored in that block. So, the insert of the third row can take place there.

  6. And so on. Until the forwarding addresses fill the block—up to PCTFREE.

That’s a good one, hmm?

We can even verify that:

SQL> SELECT * FROM (
       SELECT MIN(X)                               min_x
            , MAX(x)                               max_x
            , MAX(x) - MIN(x) + 1                  diff_x
            , DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_number
            , COUNT(*)            rows_in_block
         FROM row_mig1
        GROUP BY DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid)
        ORDER BY min_x
     ) WHERE ROWNUM <=10;

     MIN_X      MAX_X     DIFF_X BLOCK_NUMBER ROWS_IN_BLOCK
---------- ---------- ---------- ------------ -------------
         1        113        113          524           113
       114        222        109          523           109
       223        332        110          537           110
       333        441        109          538           109
       442        550        109          539           109
       551        660        110          540           110
       661        769        109          541           109
       770        878        109          542           109
       879        987        109          543           109
       988       1096        109          525           109

10 rows selected.

SQL>

You see that the first row was inserted into block number 524. All subsequent rows up to the 113th were put into the same block. When that block was finally filled up—with one row and 112 forwarding addresses—the game starts over in the next block. All the INSERT statements took place in just 918 distinct blocks. Because the ROWID is assigned during the INSERT, the subsequent migration of the row due to the UPDATE is not reflected in the ROWID.

Neither DBMS_STATS nor ANALYSE TABLE look into the table to check if the row is really there or if the particular block it is just an accumulation of forwarding addresses. From their perspective, all the rows are in the same block—this is how the clustering factor is calculated. Although correctly calculated—technically—and up to date, the clustering factor of this index does not reflect the real situation.

The “correct” value—in that sense that it reflects the data distribution correctly—for the clustering factor would be 100.000; that is, the number of rows. If the clustering factor equals the number of rows in the index—which is the worst possible case—it means that there are no two adjacent index entries that refer to the same table block. This is actually the case because no 8k block can contain two complete 6k rows.

The Clustering Factor is Wrong, So What?

So, what’s the problem if the clustering factor is wrong?

The problem is that the Cost Based Optimizer (CBO) uses the clustering factor in its cost calculation for an INDEX RANGE SCAN (see Fallacies of the Cost Based Optimizer [pdf]). That means, the cost of the INDEX RANGE SCAN will be too low, because the clustering factor is way too low. Luckily there is an effect that makes all that less problematic; all indexes on that table are affected.

Let’s make a second index to verify that:

SQL> CREATE INDEX row_mig1_idx ON row_mig1(filter);

Index created.

SQL> BEGIN
       DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1', 
                                     CASCADE => TRUE);
     END;
     /

PL/SQL procedure successfully completed.

SQL> SELECT i.index_name         index_name
          , t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name = 'ROW_MIG1';

INDEX_NAME   TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
------------ ------------ ----------------------- ----------
ROW_MIG1_PK        100877                     918     100000
ROW_MIG1_IDX       100877                    9180     100000

SQL> 

The clustering factor of the new index is bigger than the one of the original index, but still way off its real value. The “correct” clustering factor for the new index would be 100.000 because there are no two adjacent index entries that refer to the table block. Well, they actually do, but there is nothing more than the forwarding address in those blocks.

Ten Times as High

The clustering factor is ten times as high because the filter column has ten distinct values. That means that the adjacent index entries will refer to ten blocks at least, because those entries were not inserted in the same order as they are stored in the index. On the other hand, the index on the filter column is not only sorted by the filter value, but also by the ROWID—as every nonunique index in Oracle—so that the clustering factor is kept at a minimum. Finally, the clustering factor is ten times as high because it refers to ten times as many table blocks, but does not grow above that because the index order keeps the clustering factor at a minimum.

Although the clustering factor does not correctly reflect the efforts to fetch the table rows, it correctly reflects the relation between the two indexes. The primary key index has a lower clustering factor because it has the same sequence as the table itself. On the other side, the index on the filter column has a different order, thus the value is higher. The side note explains why it is Ten Times as High.

If all indexes are affected, there is hardly any real problem I can see for a single table access. Even queries that can be executed with two different indexes, the CBO will most likely not do wrong because both clustering factors are misleading.

The Join Trap

If it’s not possible to confuse the optimizer with a single table, let’s use more of them. So, I try to build a trap where the optimizer’s decision of the join order is influenced by the phony clustering factor so that the optimizer takes the less efficient execution plan.

For that purpose, I build a second table very similar to the first one. There are only two differences:

  1. I don’t follow the “insert empty, update everything” anti-pattern—there will be no row migration.
  2. The selectivity of the filter columns is slightly increased (about 6% instead of 10%).

Here is the overall script:

CREATE TABLE row_mig2 (
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  x NUMBER      NOT NULL,
  filter NUMBER NOT NULL,
  CONSTRAINT row_mig2_pk PRIMARY KEY (x)
) ENABLE ROW MOVEMENT;

INSERT INTO row_mig2 (x, filter, a, b, c) 
     SELECT level, trunc(dbms_random.value(0,17)), 'a', 'b', 'c' 
       FROM dual CONNECT BY level <= 100000;
COMMIT;

CREATE INDEX row_mig2_idx ON row_mig2(filter);

EXEC DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG2', CASCADE=>TRUE);

Now let’s check the statistics:

SQL> SELECT i.index_name         index_name
          , t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name = 'ROW_MIG2';

INDEX_NAME   TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
------------ ------------ ----------------------- ----------
ROW_MIG2_PK        100749                  100000     100000
ROW_MIG2_IDX       100749                  100000     100000

SQL> 

Please note that the clustering factor is at the upper bound of the expected range; that means, it indicates that there are no two adjacent index entries referring to the same table block. That’s somehow logical, if we consider that every row is in its own table block.

After that preparation, I can present my “trapQL”:

SELECT * 
  FROM row_mig1 d1 
  JOIN row_mig2 d2 ON (d1.x = d2.x) 
 WHERE d1.filter = 0 
   AND d2.filter = 0;

It is a rather trivial join on the primary keys. Additionally each table is filtered on the filter column for one particular value. The trap works because a NESTED LOOPS join is possible in both ways. Either by filtering the first table by an INDEX RANGE SCAN on filter and then fetch the corresponding entry from the second by a primary key lookup, or vice versa. However, because we—as well as the optimizer—know that the second table is more selective than the first one, the more efficient way to execute that query is to first perform the INDEX RANGE SCAN on the second table and then join in the first one. In that way, the number of primary key lookups is reduced and the overall performance will be better. Considering that the first table suffers from heavy row migration, that effect becomes even more relevant. However, it is of course the purpose of the discussion to proof that the optimizer is doing “wrong”:

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* original clustering factor */ * 
  FROM row_mig1 d1 
  JOIN row_mig2 d2 ON (d1.x = d2.x) 
 WHERE d1.filter = 0 
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

And the result is:

577 rows selected.

Elapsed: 00:01:08.35

Execution Plan
----------------------------------------------------------
Plan hash value: 4162018446

---------------------------------------------------------------------
| Id | Operation                     | Name         | Rows  | Cost  |
---------------------------------------------------------------------
|  0 | SELECT STATEMENT              |              |  5882 | 10941 |
|  1 |  NESTED LOOPS                 |              |       |       |
|  2 |   NESTED LOOPS                |              |  5882 | 10941 |
|  3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG1     | 10000 |   938 |
|* 4 |     INDEX RANGE SCAN          | ROW_MIG1_IDX | 10000 |    20 |
|* 5 |    INDEX UNIQUE SCAN          | ROW_MIG2_PK  |     1 |     0 |
|* 6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG2     |     1 |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D1"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D2"."FILTER"=0)

Statistics
----------------------------------------------------------
        913  recursive calls
          0  db block gets
      35584  consistent gets
      21027  physical reads
          0  redo size
      39012  bytes sent via SQL*Net to client
        837  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
        577  rows processed

The optimizer has chosen to perform the INDEX RANGE SCAN on ROW_MIG1_IDX first. The optimizer is well aware of the fact that the INDEX RANGE SCAN will return about 10000 rows; still it was preferred over the alternative execution plan.

So let’s check what happens if we tell the optimizer the truth about that table’s indexes?

BEGIN
  DBMS_STATS.SET_INDEX_STATS(null, 'ROW_MIG1_PK', clstfct=>100000);
  DBMS_STATS.SET_INDEX_STATS(null, 'ROW_MIG1_IDX',clstfct=>100000);
END;
/

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* updated clustering factor */ * 
  FROM row_mig1 d1 
  JOIN row_mig2 d2 ON (d1.x = d2.x) 
 WHERE d1.filter = 0 
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

The only change is that the statistics have been manually updated to the “more correct” clustering factor of 100.000. Unfortunately neither DBMS_STATS nor ANALYZE TABLE can be used for that purpose, so I did it manually. Please note that the table itself was not changed; most of the rows are still migrated.

And the result is:

577 rows selected.

Elapsed: 00:00:59.91

Execution Plan
----------------------------------------------------------
Plan hash value: 3004301745

---------------------------------------------------------------------
| Id | Operation                     | Name         | Rows  | Cost  |
---------------------------------------------------------------------
|  0 | SELECT STATEMENT              |              |  5882 | 11780 |
|  1 |  NESTED LOOPS                 |              |       |       |
|  2 |   NESTED LOOPS                |              |  5882 | 11780 |
|  3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG2     |  5882 |  5896 |
|* 4 |     INDEX RANGE SCAN          | ROW_MIG2_IDX |  5882 |    12 |
|* 5 |    INDEX UNIQUE SCAN          | ROW_MIG1_PK  |     1 |     0 |
|* 6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG1     |     1 |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D2"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D1"."FILTER"=0)

Statistics
----------------------------------------------------------
        913  recursive calls
          0  db block gets
      20695  consistent gets
      12817  physical reads
          0  redo size
      39012  bytes sent via SQL*Net to client
        837  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
        577  rows processed

The more representative clustering factor makes the optimizer take the expected plan. The filtering takes place on the more selective table first, which matches about 5900 rows. The other table is joined in later. The execution is about 13% faster, the logical and physical gets dropped by about 40%. That makes quite a difference.

The cost of the TABLE ACCESS BY INDEX ROWID that follows the INDEX RANGE SCAN reflects the clustering factor’s impact. The second query plan has a cost of about 5900, that actually means that each fetched row will need a block read. The original execution plan had a cost value of 938 for that step, so that the overall cost value was lower.

After that I must remind the reader that the rows in table ROW_MIG1 are still migrated. The performance difference is not cause by the row migration per se, but by the misleading statistics that result out of the row migration.

Correcting the Row Migration

To complete the exercise, I will correct the row migration, run the statement again, and compare the performance improvement:

SQL> ALTER TABLE row_mig1 MOVE;

Table altered.

SQL> ALTER INDEX row_mig1_pk REBUILD;

Index altered.

SQL> ALTER INDEX row_mig1_idx REBUILD;

Index altered.

SQL> BEGIN 
       DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1',CASCADE=>TRUE);
     END;
     /

PL/SQL procedure successfully completed.

SQL> SELECT i.index_name         index_name
          , t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name = 'ROW_MIG1';

INDEX_NAME     TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
-------------- ------------ ----------------------- ----------
ROW_MIG1_PK          100506                  100000     100000
ROW_MIG1_IDX         100506                  100000     100000

SQL> 

Then execute the statement again:

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* no row migration */ * 
  FROM row_mig1 d1 
  JOIN row_mig2 d2 ON (d1.x = d2.x) 
 WHERE d1.filter = 0 
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

Because the statistics are almost identical, the plan doesn’t change, nor does the cost. What does change is the execution time as well as the number of logical gets:

577 rows selected.

Elapsed: 00:00:30.90

Execution Plan
----------------------------------------------------------
Plan hash value: 3004301745

---------------------------------------------------------------------
| Id | Operation                     | Name         | Rows  | Cost  |
---------------------------------------------------------------------
|  0 | SELECT STATEMENT              |              |  5882 | 11780 |
|  1 |  NESTED LOOPS                 |              |       |       |
|  2 |   NESTED LOOPS                |              |  5882 | 11780 |
|  3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG2     |  5882 |  5896 |
|* 4 |     INDEX RANGE SCAN          | ROW_MIG2_IDX |  5882 |    12 |
|* 5 |    INDEX UNIQUE SCAN          | ROW_MIG1_PK  |     1 |     0 |
|* 6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG1     |     1 |     1 |
---------------------------------------------------------------------            

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D2"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D1"."FILTER"=0)


Statistics
----------------------------------------------------------
        925  recursive calls
          0  db block gets
      14271  consistent gets
      11952  physical reads
          0  redo size
      39012  bytes sent via SQL*Net to client
        837  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
         13  sorts (memory)
          0  sorts (disk)
        577  rows processed

Summary

The article investigates the effects of row migration on the clustering factor and the optimizer. A “proof of concept” SQL demonstrates that row migration can affect the optimizer. The lessons from this article are:

  • The “insert empty, update everything” pattern can lead to a very high row migration rate.
  • DBMS_STATS doesn’t populate the CHAIN_CNT.
  • A high row migration rate can cut the clustering factor, even below its theoretic minimum.
  • A wrong clustering factor can affect the optimizer and result in a suboptimal plan.

Update 2010-04-30: As commented by Jonathan Lewis, I also checked the “trapQL” against tables that were analyzed with ANALYZEsurprising results.

Disclaimer

Please note that this trap was intentionally built, just to prove that row migration can potentially influence the optimizer. There are better ways to tune that particular SQL, foremost a better indexing approach.

I have put many adjectives in quotation marks because they are not in line with the technical definition of the respective noun.

The “insert empty, update everything“ anti-pattern was used to create a very high row migration rate. Although that anti-pattern can lead to 100% migration rate, it does not always.

I have successfully verified my results on Oracle 10gR1, 11gR1 and 11gR2.

Thanks

Thanks to the guys at 25th-floor who verified my results on Oracle 11gR1.


Do not use offset for pagination. Learn why.

SQL Performance Tuning Book for developers
  1. Very definite and informative post

    Thank you very much Markus

  2. […] 9-How does row migration effect execution plans via bogus clustering factor? Markus Winand-Clustering Factor: Row Migration’s Victim […]

  3. Nice Article.

    A point you may be interested in is that the optimizer will use the chain_cnt statistic in its calculations when working out the execution plan – even though dbms_stats doesn’t set it. (See http://jonathanlewis.wordpress.com/2009/04/30/analyze-this/ )

    You might like to test your trapQL after using ANALYZE to collect the stats.