As Jonathan Lewis commented on my article Clustering Factor: Row Migrtion’s Victim, there is even more to say about the difference between the good, old, and deprecated ANALYZE
statement and the DBMS_STATS
package. Jonathan mentioned that the CBO is using the CHAIN_CNT
value in the statistics, if present, and suggested to try my “trapQL” after analyzing the base table in the old fashion.
For easy C&P I the complete new script to try:
CREATE TABLE row_mig1 ( a CHAR(2000), b CHAR(2000), c CHAR(2000), x NUMBER NOT NULL, filter NUMBER NOT NULL, CONSTRAINT row_mig1_pk PRIMARY KEY (x) ) ENABLE ROW MOVEMENT; BEGIN FOR i IN 1..100000 LOOP INSERT INTO row_mig1 (x, filter) VALUES (i, trunc(dbms_random.value(0, 10))); UPDATE row_mig1 SET a ='a', b = 'b', c = 'c' WHERE x=i; END LOOP; END; / COMMIT; CREATE INDEX row_mig1_idx ON row_mig1(filter); ANALYZE TABLE row_mig1 COMPUTE STATISTICS; ANALYZE INDEX row_mig1_pk COMPUTE STATISTICS; ANALYZE INDEX row_mig1_idx COMPUTE STATISTICS; SELECT i.index_name idx_name , t.blocks table_blocks , i.clustering_factor idx_clust_factor , i.num_rows idx_rows , t.chain_cnt table_chain_cnt FROM user_indexes i JOIN user_tables t USING (table_name) WHERE table_name = 'ROW_MIG1';
This is the very same table as in my previous article. I just replaced the DBMS_STATS
with ANALYZE
. As expected, the CHAIN_CNT
is set correctly:
IDX_NAME TABLE_BLOCKS IDX_CLUST_FACTOR IDX_ROWS TABLE_CHAIN_CNT -------------- ------------ ---------------- -------- --------------- ROW_MIG1_PK 100103 919 100000 99999 ROW_MIG1_IDX 100103 9187 100000 99999
So far, so good. Now let’s create the second table—without row migration:
CREATE TABLE row_mig2 ( a CHAR(2000), b CHAR(2000), c CHAR(2000), x NUMBER NOT NULL, filter NUMBER NOT NULL, CONSTRAINT row_mig2_pk PRIMARY KEY (x) ) ENABLE ROW MOVEMENT; INSERT INTO row_mig2 (x, filter, a, b, c) SELECT level, trunc(dbms_random.value(0,17)), 'a', 'b', 'c' FROM dual CONNECT BY level <= 100000; COMMIT; CREATE INDEX row_mig2_idx ON row_mig2(filter); ANALYZE TABLE row_mig2 COMPUTE STATISTICS; ANALYZE INDEX row_mig2_pk COMPUTE STATISTICS; ANALYZE INDEX row_mig2_idx COMPUTE STATISTICS; SELECT i.index_name idx_name , t.blocks table_blocks , i.clustering_factor idx_clust_factor , i.num_rows idx_rows , t.chain_cnt table_chain_cnt FROM user_indexes i JOIN user_tables t USING (table_name) WHERE table_name = 'ROW_MIG2';
Just to verify that there is no chaining when a plain insert approach is followed:
IDX_NAME TABLE_BLOCKS IDX_CLUST_FACTOR IDX_ROWS TABLE_CHAIN_CNT -------------- ------------ ---------------- -------- --------------- ROW_MIG2_IDX 100877 100000 100000 0 ROW_MIG2_PK 100877 100000 100000 0
So, let’s see how my “trapQL” works:
SET AUTOTRACE TRACEONLY; SET TIMING ON; SELECT /* analyze that! */ * FROM row_mig1 d1 JOIN row_mig2 d2 ON (d1.x = d2.x) WHERE d1.filter = 0 AND d2.filter = 0; SET TIMING OFF SET AUTOTRACE OFF
And the result is:
602 rows selected. Elapsed: 00:01:00.73 Execution Plan ---------------------------------------------------------- Plan hash value: 3163423506 ---------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost | ---------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5882 | 17663 | | 1 | NESTED LOOPS | | | | | 2 | NESTED LOOPS | | 5882 | 17663 | | 3 | TABLE ACCESS BY INDEX ROWID| ROW_MIG2 | 5882 | 5896 | |* 4 | INDEX RANGE SCAN | ROW_MIG2_IDX | 5882 | 12 | |* 5 | INDEX UNIQUE SCAN | ROW_MIG1_PK | 1 | 0 | |* 6 | TABLE ACCESS BY INDEX ROWID | ROW_MIG1 | 1 | 2 | ---------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("D2"."FILTER"=0) 5 - access("D1"."X"="D2"."X") 6 - filter("D1"."FILTER"=0) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 20929 consistent gets 12994 physical reads 0 redo size 39658 bytes sent via SQL*Net to client 859 bytes received via SQL*Net from client 42 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 602 rows processed
Surprising, at least to me. The optimizer considers the CHAIN_CNT
and produces the best possible execution plan. The cost for the TABLE ACCESS BY INDEX ROW ID
has grown to two. That’s a very concise reflection of the additional block that needs to be fetched.
To verify that the difference is the CHAIN_CNT
let’s verify the plan after DBMS_STATS
collection:
SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1', CASCADE => TRUE); DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG12', CASCADE => TRUE); END; / PL/SQL procedure successfully completed. SQL> SELECT i.index_name idx_name , t.blocks table_blocks , i.clustering_factor idx_clust_factor , i.num_rows idx_rows , t.chain_cnt table_chain_cnt FROM user_indexes i JOIN user_tables t USING (table_name) WHERE table_name IN ('ROW_MIG1', 'ROW_MIG2') ORDER BY i.index_name; IDX_NAME TABLE_BLOCKS IDX_CLUST_FACTOR IDX_ROWS TABLE_CHAIN_CNT -------------- ------------ ---------------- -------- --------------- ROW_MIG1_IDX 100622 9180 100000 99999 ROW_MIG1_PK 100622 918 100000 99999 ROW_MIG2_IDX 100239 100000 100000 0 ROW_MIG2_PK 100239 100000 100000 0
Well, as a matter of fact, DBMS_STATS
just doesn’t care about the CHAIN_CNT
. So, the old values remain there. On the other hand, there are some minor differences in the other stats like the table blocks and the index clustering factor of ROW_MIG1_IDX
. However, the execution plan doesn’t change:
602 rows selected. Elapsed: 00:01:03.71 Execution Plan ---------------------------------------------------------- Plan hash value: 3163423506 ---------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost | ---------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6084 | 18270 | | 1 | NESTED LOOPS | | | | | 2 | NESTED LOOPS | | 6084 | 18270 | | 3 | TABLE ACCESS BY INDEX ROWID| ROW_MIG2 | 6084 | 6098 | |* 4 | INDEX RANGE SCAN | ROW_MIG2_IDX | 6084 | 12 | |* 5 | INDEX UNIQUE SCAN | ROW_MIG1_PK | 1 | 0 | |* 6 | TABLE ACCESS BY INDEX ROWID | ROW_MIG1 | 1 | 2 | ---------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("D2"."FILTER"=0) 5 - access("D1"."X"="D2"."X") 6 - filter("D1"."FILTER"=0) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 21131 consistent gets 12911 physical reads 0 redo size 39658 bytes sent via SQL*Net to client 859 bytes received via SQL*Net from client 42 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 602 rows processed
Well, it seems that the CHAIN_CNT
is really making all the difference since it is the only value that was not calculated by DMBS_STATS
…or is it? The ultimate proof would be to update the CHAIN_CNT
manually to zero in the same way as I manually update the CLUSTERING_FACTOR
using DBMS_STATS
in my previous post. However, I already mentioned DBMS_STATS
doesn’t care about the CHAIN_CNT
, consequently there is no parameter in SET_TABLE_STATS to manually update it.
For the sake of science, I have done what nobody should ever do. And I will definitely not post what I’ve done. However, I did it, and I can tell you; the CHAIN_CNT
is making all the difference.
Summary
The story from the previous posts is a little bit depressing. Although ANALYZE
is deprecated, it provides more information to the CBO than DBMS_STATS
. Using the recommended DBMS_STATS
alone opens the problem that I have demonstrated in my previous post. The depressing factor is that this is actually a regression. A useful functionality of the CBO was “removed” by the recommendation to use DBMS_STATS
.
Once more I must emphasize that all of that trouble was caused by the excessive use of the “insert empty, update everything” anti-pattern. Obviously everybody should try very hard not to follow that pattern. That solves all the problems.
Thanks
Thanks to Jonathan Lewis for pointing me onto the question how my “trapQL” works with ANALYZE
.
Special thanks to Gerhard Kircher.