Markus Winand's Blog

Analyze That

In Performance on 2010-04-30 at 17:03

As Jonathan Lewis commented on my article Clustering Factor: Row Migrtion’s Victim, there is even more to say about the difference between the good, old, and deprecated ANALYZE statement and the DBMS_STATS package. Jonathan mentioned that the CBO is using the CHAIN_CNT value in the statistics, if present, and suggested to try my “trapQL” after analyzing the base table in the old fashion.

For easy C&P I the complete new script to try:

CREATE TABLE row_mig1 (
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  x NUMBER      NOT NULL,
  filter NUMBER NOT NULL,
  CONSTRAINT row_mig1_pk PRIMARY KEY (x)
) ENABLE ROW MOVEMENT;

BEGIN
   FOR i IN 1..100000 LOOP
      INSERT INTO row_mig1
                    (x, filter)
             VALUES (i, trunc(dbms_random.value(0, 10)));

      UPDATE row_mig1
         SET a ='a', b = 'b', c = 'c'
       WHERE x=i;
   END LOOP;
END;
/
COMMIT;

CREATE INDEX row_mig1_idx ON row_mig1(filter);

ANALYZE TABLE row_mig1 COMPUTE STATISTICS;
ANALYZE INDEX row_mig1_pk COMPUTE STATISTICS;
ANALYZE INDEX row_mig1_idx COMPUTE STATISTICS;

SELECT i.index_name         idx_name
     , t.blocks             table_blocks
     , i.clustering_factor  idx_clust_factor
     , i.num_rows           idx_rows
     , t.chain_cnt          table_chain_cnt
  FROM user_indexes i
  JOIN user_tables t USING (table_name)
 WHERE table_name = 'ROW_MIG1';

This is the very same table as in my previous article. I just replaced the DBMS_STATS with ANALYZE. As expected, the CHAIN_CNT is set correctly:

IDX_NAME       TABLE_BLOCKS IDX_CLUST_FACTOR IDX_ROWS TABLE_CHAIN_CNT
-------------- ------------ ---------------- -------- ---------------
ROW_MIG1_PK          100103              919   100000           99999
ROW_MIG1_IDX         100103             9187   100000           99999

So far, so good. Now let’s create the second table—without row migration:

CREATE TABLE row_mig2 (
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  x NUMBER      NOT NULL,
  filter NUMBER NOT NULL,
  CONSTRAINT row_mig2_pk PRIMARY KEY (x)
) ENABLE ROW MOVEMENT;

INSERT INTO row_mig2 (x, filter, a, b, c)
     SELECT level, trunc(dbms_random.value(0,17)), 'a', 'b', 'c'
       FROM dual CONNECT BY level <= 100000;
COMMIT;

CREATE INDEX row_mig2_idx ON row_mig2(filter);

ANALYZE TABLE row_mig2 COMPUTE STATISTICS;
ANALYZE INDEX row_mig2_pk COMPUTE STATISTICS;
ANALYZE INDEX row_mig2_idx COMPUTE STATISTICS;

SELECT i.index_name         idx_name
     , t.blocks             table_blocks
     , i.clustering_factor  idx_clust_factor
     , i.num_rows           idx_rows
     , t.chain_cnt          table_chain_cnt
  FROM user_indexes i
  JOIN user_tables t USING (table_name)
 WHERE table_name = 'ROW_MIG2';

Just to verify that there is no chaining when a plain insert approach is followed:

IDX_NAME       TABLE_BLOCKS IDX_CLUST_FACTOR IDX_ROWS TABLE_CHAIN_CNT
-------------- ------------ ---------------- -------- ---------------
ROW_MIG2_IDX         100877           100000   100000               0
ROW_MIG2_PK          100877           100000   100000               0

So, let’s see how my “trapQL” works:

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* analyze that! */ *
  FROM row_mig1 d1
  JOIN row_mig2 d2 ON (d1.x = d2.x)
 WHERE d1.filter = 0
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

And the result is:

602 rows selected.

Elapsed: 00:01:00.73

Execution Plan
----------------------------------------------------------
Plan hash value: 3163423506

----------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  |  Cost |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  5882 | 17663 |
|   1 |  NESTED LOOPS                 |              |       |       |
|   2 |   NESTED LOOPS                |              |  5882 | 17663 |
|   3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG2     |  5882 |  5896 |
|*  4 |     INDEX RANGE SCAN          | ROW_MIG2_IDX |  5882 |    12 |
|*  5 |    INDEX UNIQUE SCAN          | ROW_MIG1_PK  |     1 |     0 |
|*  6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG1     |     1 |     2 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D2"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D1"."FILTER"=0)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      20929  consistent gets
      12994  physical reads
          0  redo size
      39658  bytes sent via SQL*Net to client
        859  bytes received via SQL*Net from client
         42  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        602  rows processed

Surprising, at least to me. The optimizer considers the CHAIN_CNT and produces the best possible execution plan. The cost for the TABLE ACCESS BY INDEX ROW ID has grown to two. That’s a very concise reflection of the additional block that needs to be fetched.

To verify that the difference is the CHAIN_CNT let’s verify the plan after DBMS_STATS collection:

SQL> BEGIN
       DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1',
                                     CASCADE => TRUE);
       DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG12',
                                     CASCADE => TRUE);
     END;
     /


PL/SQL procedure successfully completed.

SQL> SELECT i.index_name         idx_name
          , t.blocks             table_blocks
          , i.clustering_factor  idx_clust_factor
          , i.num_rows           idx_rows
          , t.chain_cnt          table_chain_cnt
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name IN ('ROW_MIG1', 'ROW_MIG2')
      ORDER BY i.index_name;

IDX_NAME       TABLE_BLOCKS IDX_CLUST_FACTOR IDX_ROWS TABLE_CHAIN_CNT
-------------- ------------ ---------------- -------- ---------------
ROW_MIG1_IDX         100622             9180   100000           99999
ROW_MIG1_PK          100622              918   100000           99999
ROW_MIG2_IDX         100239           100000   100000               0
ROW_MIG2_PK          100239           100000   100000               0

Well, as a matter of fact, DBMS_STATS just doesn’t care about the CHAIN_CNT. So, the old values remain there. On the other hand, there are some minor differences in the other stats like the table blocks and the index clustering factor of ROW_MIG1_IDX. However, the execution plan doesn’t change:

602 rows selected.

Elapsed: 00:01:03.71

Execution Plan
----------------------------------------------------------
Plan hash value: 3163423506

----------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  |  Cost |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  6084 | 18270 |
|   1 |  NESTED LOOPS                 |              |       |       |
|   2 |   NESTED LOOPS                |              |  6084 | 18270 |
|   3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG2     |  6084 |  6098 |
|*  4 |     INDEX RANGE SCAN          | ROW_MIG2_IDX |  6084 |    12 |
|*  5 |    INDEX UNIQUE SCAN          | ROW_MIG1_PK  |     1 |     0 |
|*  6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG1     |     1 |     2 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D2"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D1"."FILTER"=0)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      21131  consistent gets
      12911  physical reads
          0  redo size
      39658  bytes sent via SQL*Net to client
        859  bytes received via SQL*Net from client
         42  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        602  rows processed

Well, it seems that the CHAIN_CNT is really making all the difference since it is the only value that was not calculated by DMBS_STATS…or is it? The ultimate proof would be to update the CHAIN_CNT manually to zero in the same way as I manually update the CLUSTERING_FACTOR using DBMS_STATS in my previous post. However, I already mentioned DBMS_STATS doesn’t care about the CHAIN_CNT, consequently there is no parameter in SET_TABLE_STATS to manually update it.

For the sake of science, I have done what nobody should ever do. And I will definitely not post what I’ve done. However, I did it, and I can tell you; the CHAIN_CNT is making all the difference.

Summary

The story from the previous posts is a little bit depressing. Although ANALYZE is deprecated, it provides more information to the CBO than DBMS_STATS. Using the recommended DBMS_STATS alone opens the problem that I have demonstrated in my previous post. The depressing factor is that this is actually a regression. A useful functionality of the CBO was “removed” by the recommendation to use DBMS_STATS.

Once more I must emphasize that all of that trouble was caused by the excessive use of the “insert empty, update everything” anti-pattern. Obviously everybody should try very hard not to follow that pattern. That solves all the problems.

Thanks

Thanks to Jonathan Lewis for pointing me onto the question how my “trapQL” works with ANALYZE.

Special thanks to Gerhard Kircher.


Do not use offset for pagination. Learn why.

SQL Performance Tuning Book for developers