This article describes the effects of a high row migration rate on the clustering factor and the optimizer’s ability to select the best execution plan.
In my previous article—Row Migration and Row Movement—I have demonstrated that the “insert empty, update everything” anti-pattern can lead to 100% row migration. This article continues the research on row migration and unveils surprising effects on the clustering factor. To be precise, the clustering factor can become completely bogus in presence of a very high row migration rate. Once the clustering factor is “wrong”, it’s just a finger exercise to construct an optimizer trap and proof that row migration can affect the query plan.
To start off, I create a table similar to the one in the previous article. However, I add one more column and populate it with random values from 0 to 9. I will need this column to build my optimizer trap later on.
CREATE TABLE row_mig1 ( a CHAR(2000), b CHAR(2000), c CHAR(2000), x NUMBER NOT NULL, filter NUMBER NOT NULL, CONSTRAINT row_mig1_pk PRIMARY KEY (x) ) ENABLE ROW MOVEMENT; BEGIN FOR i IN 1..100000 LOOP INSERT INTO row_mig1 (x, filter) VALUES (i, trunc(dbms_random.value(0, 10))); UPDATE row_mig1 SET a ='a', b = 'b', c = 'c' WHERE x=i; END LOOP; END; / COMMIT; EXEC DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1', CASCADE=>TRUE);
Up till now everything is very similar to the last article, except that I used DBMS_STATS
instead of ANALYZE TABLE
. Although we know exactly that almost every row was migrated, DBMS_STATS
doesn’t care about that:
SQL> SELECT num_rows, chain_cnt FROM user_tables WHERE table_name='ROW_MIG1'; NUM_ROWS CHAIN_CNT ---------- ---------- 100000 0 SQL>
However, let’s have a look at the index:
SQL> SELECT num_rows, leaf_blocks, clustering_factor FROM user_indexes WHERE index_name = 'ROW_MIG1_PK'; NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ---------- ----------- ----------------- 100000 187 918 SQL>
Well, that’s an excellent clustering factor. A low clustering factor indicates that the index is in the same sequence as the table. That’s true in that case because the index column corresponds to the order in which the rows were inserted to the table. However, 918, how can this be? The clustering factor is supposed to be between the number of table blocks—which indicates a good clustering factor—and the number of index rows—which is the worst case. So, let’s look at the table size:
SQL> SELECT t.blocks table_blocks , i.clustering_factor index_clustering_factor , i.num_rows index_rows FROM user_indexes i JOIN user_tables t USING (table_name) WHERE index_name = 'ROW_MIG1_PK'; TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS ------------ ----------------------- ---------- 100877 918 100000 SQL>
According to that, the lower bound for the clustering factor is higher than the upper bound. Hmm, let’s investigate the clustering factor manually and verify the distribution of the rows across the table blocks:
SQL> SELECT * FROM ( SELECT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_number , COUNT(*) rows_in_block FROM row_mig1 GROUP BY DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) ) WHERE ROWNUM <=10; BLOCK_NUMBER ROWS_IN_BLOCK ------------ ------------- 523 109 524 113 525 109 526 109 527 110 536 109 537 110 538 109 539 109 540 110 10 rows selected. SQL>
There are 109 different ROWIDs
that refer to the block number 523. But we know that each record has about 6k. It is impossible to fit 109 rows into a single block of 8k. However, the “insert empty, update everything” anti-pattern makes it possible. The game goes like this:
-
The very first row is inserted into a new block.
-
The first row is updated, and fits into the same block. The free space in that particular block is still more than a kilobyte.
-
The second row is inserted into the very same block, because there is enough free space available in that block.
-
The update of the second row triggers the migration of that row to a different block.
The row migration changes neither the
ROWID
nor the index entry. That means that the forwarding address—that is, somehow, the newROWID
—is stored in the original block so that the index can find the row. -
The third row is inserted into the very first table block, again.
Because the second row was moved into a different block, the very first block has still free space. There is only the first row—as a whole—and one forwarding address stored in that block. So, the insert of the third row can take place there.
- And so on. Until the forwarding addresses fill the block—up to
PCTFREE
.
That’s a good one, hmm?
We can even verify that:
SQL> SELECT * FROM ( SELECT MIN(X) min_x , MAX(x) max_x , MAX(x) - MIN(x) + 1 diff_x , DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_number , COUNT(*) rows_in_block FROM row_mig1 GROUP BY DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) ORDER BY min_x ) WHERE ROWNUM <=10; MIN_X MAX_X DIFF_X BLOCK_NUMBER ROWS_IN_BLOCK ---------- ---------- ---------- ------------ ------------- 1 113 113 524 113 114 222 109 523 109 223 332 110 537 110 333 441 109 538 109 442 550 109 539 109 551 660 110 540 110 661 769 109 541 109 770 878 109 542 109 879 987 109 543 109 988 1096 109 525 109 10 rows selected. SQL>
You see that the first row was inserted into block number 524. All subsequent rows up to the 113th were put into the same block. When that block was finally filled up—with one row and 112 forwarding addresses—the game starts over in the next block. All the INSERT
statements took place in just 918 distinct blocks. Because the ROWID
is assigned during the INSERT
, the subsequent migration of the row due to the UPDATE
is not reflected in the ROWID
.
Neither DBMS_STATS
nor ANALYSE TABLE
look into the table to check if the row is really there or if the particular block it is just an accumulation of forwarding addresses. From their perspective, all the rows are in the same block—this is how the clustering factor is calculated. Although correctly calculated—technically—and up to date, the clustering factor of this index does not reflect the real situation.
The “correct” value—in that sense that it reflects the data distribution correctly—for the clustering factor would be 100.000; that is, the number of rows. If the clustering factor equals the number of rows in the index—which is the worst possible case—it means that there are no two adjacent index entries that refer to the same table block. This is actually the case because no 8k block can contain two complete 6k rows.
The Clustering Factor is Wrong, So What?
So, what’s the problem if the clustering factor is wrong?
The problem is that the Cost Based Optimizer (CBO) uses the clustering factor in its cost calculation for an INDEX RANGE SCAN
(see Fallacies of the Cost Based Optimizer [pdf]). That means, the cost of the INDEX RANGE SCAN
will be too low, because the clustering factor is way too low. Luckily there is an effect that makes all that less problematic; all indexes on that table are affected.
Let’s make a second index to verify that:
SQL> CREATE INDEX row_mig1_idx ON row_mig1(filter); Index created. SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1', CASCADE => TRUE); END; / PL/SQL procedure successfully completed. SQL> SELECT i.index_name index_name , t.blocks table_blocks , i.clustering_factor index_clustering_factor , i.num_rows index_rows FROM user_indexes i JOIN user_tables t USING (table_name) WHERE table_name = 'ROW_MIG1'; INDEX_NAME TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS ------------ ------------ ----------------------- ---------- ROW_MIG1_PK 100877 918 100000 ROW_MIG1_IDX 100877 9180 100000 SQL>
The clustering factor of the new index is bigger than the one of the original index, but still way off its real value. The “correct” clustering factor for the new index would be 100.000 because there are no two adjacent index entries that refer to the table block. Well, they actually do, but there is nothing more than the forwarding address in those blocks.
Ten Times as High
The clustering factor is ten times as high because the filter column has ten distinct values. That means that the adjacent index entries will refer to ten blocks at least, because those entries were not inserted in the same order as they are stored in the index. On the other hand, the index on the filter
column is not only sorted by the filter
value, but also by the ROWID
—as every nonunique index in Oracle—so that the clustering factor is kept at a minimum. Finally, the clustering factor is ten times as high because it refers to ten times as many table blocks, but does not grow above that because the index order keeps the clustering factor at a minimum.
Although the clustering factor does not correctly reflect the efforts to fetch the table rows, it correctly reflects the relation between the two indexes. The primary key index has a lower clustering factor because it has the same sequence as the table itself. On the other side, the index on the filter
column has a different order, thus the value is higher. The side note explains why it is Ten Times as High.
If all indexes are affected, there is hardly any real problem I can see for a single table access. Even queries that can be executed with two different indexes, the CBO will most likely not do wrong because both clustering factors are misleading.
The Join Trap
If it’s not possible to confuse the optimizer with a single table, let’s use more of them. So, I try to build a trap where the optimizer’s decision of the join order is influenced by the phony clustering factor so that the optimizer takes the less efficient execution plan.
For that purpose, I build a second table very similar to the first one. There are only two differences:
- I don’t follow the “insert empty, update everything” anti-pattern—there will be no row migration.
- The selectivity of the
filter
columns is slightly increased (about 6% instead of 10%).
Here is the overall script:
CREATE TABLE row_mig2 ( a CHAR(2000), b CHAR(2000), c CHAR(2000), x NUMBER NOT NULL, filter NUMBER NOT NULL, CONSTRAINT row_mig2_pk PRIMARY KEY (x) ) ENABLE ROW MOVEMENT; INSERT INTO row_mig2 (x, filter, a, b, c) SELECT level, trunc(dbms_random.value(0,17)), 'a', 'b', 'c' FROM dual CONNECT BY level <= 100000; COMMIT; CREATE INDEX row_mig2_idx ON row_mig2(filter); EXEC DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG2', CASCADE=>TRUE);
Now let’s check the statistics:
SQL> SELECT i.index_name index_name , t.blocks table_blocks , i.clustering_factor index_clustering_factor , i.num_rows index_rows FROM user_indexes i JOIN user_tables t USING (table_name) WHERE table_name = 'ROW_MIG2'; INDEX_NAME TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS ------------ ------------ ----------------------- ---------- ROW_MIG2_PK 100749 100000 100000 ROW_MIG2_IDX 100749 100000 100000 SQL>
Please note that the clustering factor is at the upper bound of the expected range; that means, it indicates that there are no two adjacent index entries referring to the same table block. That’s somehow logical, if we consider that every row is in its own table block.
After that preparation, I can present my “trapQL”:
SELECT * FROM row_mig1 d1 JOIN row_mig2 d2 ON (d1.x = d2.x) WHERE d1.filter = 0 AND d2.filter = 0;
It is a rather trivial join on the primary keys. Additionally each table is filtered on the filter
column for one particular value. The trap works because a NESTED LOOPS
join is possible in both ways. Either by filtering the first table by an INDEX RANGE SCAN
on filter
and then fetch the corresponding entry from the second by a primary key lookup, or vice versa. However, because we—as well as the optimizer—know that the second table is more selective than the first one, the more efficient way to execute that query is to first perform the INDEX RANGE SCAN
on the second table and then join in the first one. In that way, the number of primary key lookups is reduced and the overall performance will be better. Considering that the first table suffers from heavy row migration, that effect becomes even more relevant. However, it is of course the purpose of the discussion to proof that the optimizer is doing “wrong”:
SET AUTOTRACE TRACEONLY; SET TIMING ON; SELECT /* original clustering factor */ * FROM row_mig1 d1 JOIN row_mig2 d2 ON (d1.x = d2.x) WHERE d1.filter = 0 AND d2.filter = 0; SET TIMING OFF SET AUTOTRACE OFF
And the result is:
577 rows selected. Elapsed: 00:01:08.35 Execution Plan ---------------------------------------------------------- Plan hash value: 4162018446 --------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5882 | 10941 | | 1 | NESTED LOOPS | | | | | 2 | NESTED LOOPS | | 5882 | 10941 | | 3 | TABLE ACCESS BY INDEX ROWID| ROW_MIG1 | 10000 | 938 | |* 4 | INDEX RANGE SCAN | ROW_MIG1_IDX | 10000 | 20 | |* 5 | INDEX UNIQUE SCAN | ROW_MIG2_PK | 1 | 0 | |* 6 | TABLE ACCESS BY INDEX ROWID | ROW_MIG2 | 1 | 1 | --------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("D1"."FILTER"=0) 5 - access("D1"."X"="D2"."X") 6 - filter("D2"."FILTER"=0) Statistics ---------------------------------------------------------- 913 recursive calls 0 db block gets 35584 consistent gets 21027 physical reads 0 redo size 39012 bytes sent via SQL*Net to client 837 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 12 sorts (memory) 0 sorts (disk) 577 rows processed
The optimizer has chosen to perform the INDEX RANGE SCAN
on ROW_MIG1_IDX
first. The optimizer is well aware of the fact that the INDEX RANGE SCAN
will return about 10000 rows; still it was preferred over the alternative execution plan.
So let’s check what happens if we tell the optimizer the truth about that table’s indexes?
BEGIN DBMS_STATS.SET_INDEX_STATS(null, 'ROW_MIG1_PK', clstfct=>100000); DBMS_STATS.SET_INDEX_STATS(null, 'ROW_MIG1_IDX',clstfct=>100000); END; / SET AUTOTRACE TRACEONLY; SET TIMING ON; SELECT /* updated clustering factor */ * FROM row_mig1 d1 JOIN row_mig2 d2 ON (d1.x = d2.x) WHERE d1.filter = 0 AND d2.filter = 0; SET TIMING OFF SET AUTOTRACE OFF
The only change is that the statistics have been manually updated to the “more correct” clustering factor of 100.000. Unfortunately neither DBMS_STATS
nor ANALYZE TABLE
can be used for that purpose, so I did it manually. Please note that the table itself was not changed; most of the rows are still migrated.
And the result is:
577 rows selected. Elapsed: 00:00:59.91 Execution Plan ---------------------------------------------------------- Plan hash value: 3004301745 --------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5882 | 11780 | | 1 | NESTED LOOPS | | | | | 2 | NESTED LOOPS | | 5882 | 11780 | | 3 | TABLE ACCESS BY INDEX ROWID| ROW_MIG2 | 5882 | 5896 | |* 4 | INDEX RANGE SCAN | ROW_MIG2_IDX | 5882 | 12 | |* 5 | INDEX UNIQUE SCAN | ROW_MIG1_PK | 1 | 0 | |* 6 | TABLE ACCESS BY INDEX ROWID | ROW_MIG1 | 1 | 1 | --------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("D2"."FILTER"=0) 5 - access("D1"."X"="D2"."X") 6 - filter("D1"."FILTER"=0) Statistics ---------------------------------------------------------- 913 recursive calls 0 db block gets 20695 consistent gets 12817 physical reads 0 redo size 39012 bytes sent via SQL*Net to client 837 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 12 sorts (memory) 0 sorts (disk) 577 rows processed
The more representative clustering factor makes the optimizer take the expected plan. The filtering takes place on the more selective table first, which matches about 5900 rows. The other table is joined in later. The execution is about 13% faster, the logical and physical gets dropped by about 40%. That makes quite a difference.
The cost of the TABLE ACCESS BY INDEX ROWID
that follows the INDEX RANGE SCAN
reflects the clustering factor’s impact. The second query plan has a cost of about 5900, that actually means that each fetched row will need a block read. The original execution plan had a cost value of 938 for that step, so that the overall cost value was lower.
After that I must remind the reader that the rows in table ROW_MIG1
are still migrated. The performance difference is not cause by the row migration per se, but by the misleading statistics that result out of the row migration.
Correcting the Row Migration
To complete the exercise, I will correct the row migration, run the statement again, and compare the performance improvement:
SQL> ALTER TABLE row_mig1 MOVE; Table altered. SQL> ALTER INDEX row_mig1_pk REBUILD; Index altered. SQL> ALTER INDEX row_mig1_idx REBUILD; Index altered. SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1',CASCADE=>TRUE); END; / PL/SQL procedure successfully completed. SQL> SELECT i.index_name index_name , t.blocks table_blocks , i.clustering_factor index_clustering_factor , i.num_rows index_rows FROM user_indexes i JOIN user_tables t USING (table_name) WHERE table_name = 'ROW_MIG1'; INDEX_NAME TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS -------------- ------------ ----------------------- ---------- ROW_MIG1_PK 100506 100000 100000 ROW_MIG1_IDX 100506 100000 100000 SQL>
Then execute the statement again:
SET AUTOTRACE TRACEONLY; SET TIMING ON; SELECT /* no row migration */ * FROM row_mig1 d1 JOIN row_mig2 d2 ON (d1.x = d2.x) WHERE d1.filter = 0 AND d2.filter = 0; SET TIMING OFF SET AUTOTRACE OFF
Because the statistics are almost identical, the plan doesn’t change, nor does the cost. What does change is the execution time as well as the number of logical gets:
577 rows selected. Elapsed: 00:00:30.90 Execution Plan ---------------------------------------------------------- Plan hash value: 3004301745 --------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5882 | 11780 | | 1 | NESTED LOOPS | | | | | 2 | NESTED LOOPS | | 5882 | 11780 | | 3 | TABLE ACCESS BY INDEX ROWID| ROW_MIG2 | 5882 | 5896 | |* 4 | INDEX RANGE SCAN | ROW_MIG2_IDX | 5882 | 12 | |* 5 | INDEX UNIQUE SCAN | ROW_MIG1_PK | 1 | 0 | |* 6 | TABLE ACCESS BY INDEX ROWID | ROW_MIG1 | 1 | 1 | --------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("D2"."FILTER"=0) 5 - access("D1"."X"="D2"."X") 6 - filter("D1"."FILTER"=0) Statistics ---------------------------------------------------------- 925 recursive calls 0 db block gets 14271 consistent gets 11952 physical reads 0 redo size 39012 bytes sent via SQL*Net to client 837 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 13 sorts (memory) 0 sorts (disk) 577 rows processed
Summary
The article investigates the effects of row migration on the clustering factor and the optimizer. A “proof of concept” SQL demonstrates that row migration can affect the optimizer. The lessons from this article are:
- The “insert empty, update everything” pattern can lead to a very high row migration rate.
-
DBMS_STATS
doesn’t populate theCHAIN_CNT.
- A high row migration rate can cut the clustering factor, even below its theoretic minimum.
- A wrong clustering factor can affect the optimizer and result in a suboptimal plan.
Update 2010-04-30: As commented by Jonathan Lewis, I also checked the “trapQL” against tables that were analyzed with ANALYZE
—surprising results.
Disclaimer
Please note that this trap was intentionally built, just to prove that row migration can potentially influence the optimizer. There are better ways to tune that particular SQL, foremost a better indexing approach.
I have put many adjectives in quotation marks because they are not in line with the technical definition of the respective noun.
The “insert empty, update everything“ anti-pattern was used to create a very high row migration rate. Although that anti-pattern can lead to 100% migration rate, it does not always.
I have successfully verified my results on Oracle 10gR1, 11gR1 and 11gR2.
Thanks
Thanks to the guys at 25th-floor who verified my results on Oracle 11gR1.
Very definite and informative post
Thank you very much Markus
[…] 9-How does row migration effect execution plans via bogus clustering factor? Markus Winand-Clustering Factor: Row Migration’s Victim […]
Nice Article.
A point you may be interested in is that the optimizer will use the chain_cnt statistic in its calculations when working out the execution plan – even though dbms_stats doesn’t set it. (See http://jonathanlewis.wordpress.com/2009/04/30/analyze-this/ )
You might like to test your trapQL after using ANALYZE to collect the stats.