One of the more advanced tricks I like to exploit are analytic Top-N queries. Although I am using them for quite a while, I recently discovered a “limitation” that I was not aware of. Actually—to be honest—it’s not a limitation; it is a missing optimization in a rarely used feature that can easily worked around. I must admit that I ask for quite a lot in that case.
The article starts with a general introduction into Top-N queries, applies that technique to analytic queries and explains the case where I miss an optimization. But is is really worth all that efforts? The article concludes with my answer to that question.
Please find the CREATE
and INSERT
statements at the end of the article.
Top-N Queries
Top-N queries are queries for the first N rows according to a specific sort order—e.g., the first three rows like that:
select * from ( select start_key, group_key, junk from demo where start_key = 'St' order by group_key ) where rownum <= 3;
That’s well known and very straight. However, the interesting part is performance—as usual. A naïve implementation executes the inner SQL first—that is, fetch and sort all the matching records—before limiting the result set to the first three rows. In absence of a useful index, that is really happening:
START_KEY GROUP_KEY JUNK ---------- --------- ---------- St 1 junk St 3 junk St 10 junk 3 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 142682949 -------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | -------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 1032 | 8240 | |* 1 | COUNT STOPKEY | | | | | | 2 | VIEW | | 370 | 124K| 8240 | |* 3 | SORT ORDER BY STOPKEY| | 370 | 76960 | 8240 | |* 4 | TABLE ACCESS FULL | DEMO | 370 | 76960 | 8239 | -------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=3) 3 - filter(ROWNUM<=3) 4 - filter("START_KEY"='St') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 30370 consistent gets 30365 physical reads 0 redo size 998 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 3 rows processed
A full table scan is performed—more on that in a few seconds—to retrieve all the rows that match the where clause; about 370 according to the optimizers estimate. The next sorts the entire result set. Finally the limit is applied—the COUNT STOPKEY
step—and the number of rows is reduced to three.
The performance problem of this query is obviously the full table scan. Let’s create an index to make it go away:
create index demo_idx on demo (start_key); exec dbms_stats.gather_index_stats(null, 'DEMO_IDX');
That’s much better:
START_KEY GROUP_KEY JUNK ---------- --------- ---------- St 1 junk St 3 junk St 10 junk 3 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1129354520 ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 3 | 1032 | 372 | |* 1 | COUNT STOPKEY | | | | | | 2 | VIEW | | 370 | 124K| 372 | |* 3 | SORT ORDER BY STOPKEY | | 370 | 76960 | 372 | | 4 | TABLE ACCESS BY INDEX ROWID| DEMO | 370 | 76960 | 371 | |* 5 | INDEX RANGE SCAN | DEMO_IDX | 370 | | 3 | ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=3) 3 - filter(ROWNUM<=3) 5 - access("START_KEY"='St') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 360 consistent gets 201 physical reads 0 redo size 998 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 3 rows processed
You can see that the full table scan was replaced by an index lookup and the corresponding table access. The other steps remain unchanged.
However, this is still a bad execution plan because all matching records are fetched and sorted just to throw most of them away. The following index allows a much better execution plan:
drop index demo_idx; create index demo_idx on demo (start_key, group_key); exec dbms_stats.gather_index_stats(null, 'DEMO_IDX');
The new execution plan looks like this:
ID START_KEY GROUP_KEY ---------- ---------- --------- 936196 St 1 232303 St 3 759212 St 10 3 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1891928015 ----------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 465 | 7 | |* 1 | COUNT STOPKEY | | | | | | 2 | VIEW | | 3 | 465 | 7 | | 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 3 | 36 | 7 | |* 4 | INDEX RANGE SCAN | DEMO_IDX | 370 | | 3 | ----------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=3) 4 - access("START_KEY"='St') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 7 consistent gets 0 physical reads 0 redo size 609 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
Well, that is efficient. The sort operation has vanished at all because the index definition supports the ORDER BY
clause. But even more powerful, the STOPKEY
takes effect down to the index range scan. You can see the reduced number of table accesses in the plan. Although not visible in the execution plan, the index range scan is also aborted after fetching the first three records.
Well, that optimization is in the Oracle database for quite a while—at least since 8i I guess. After that preparation, I can demonstrate what 10R2 has to offer on top of that.
Analytic Top-N Queries
It is actually the very same story with a small extension: I don’t want to retrieve the first N rows, but all the rows where the group_key
value is at it’s minimum for the respective start_key
. A very straight solution is that:
select id, start_key, group_key from demo where start_key = 'St' and group_key = (select min(group_key) from demo where start_key = 'St' );
That statement is perfectly legal—even performance wise:
ID START_KEY GROUP_KEY ---------- ---------- --------- 936196 St 1 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1142136980 ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 12 | 8 | | 1 | TABLE ACCESS BY INDEX ROWID | DEMO | 1 | 12 | 5 | |* 2 | INDEX RANGE SCAN | DEMO_IDX | 1 | | 3 | | 3 | SORT AGGREGATE | | 1 | 7 | | | 4 | FIRST ROW | | 1 | 7 | 3 | |* 5 | INDEX RANGE SCAN (MIN/MAX)| DEMO_IDX | 1 | 7 | 3 | ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("START_KEY"='St' AND "GROUP_KEY"= (SELECT MIN("GROUP_KEY") FROM "DEMO" "DEMO" WHERE "START_KEY"='St')) 5 - access("START_KEY"='St') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 550 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Analytic functions
Analytic functions can perform calculations on the basis of multiple rows. However, not to be confused with aggregate functions, analytical functions work without GROUP BY
. A very typical use for analytical functions is a running balance; that is, the sum of all the rows preceding the current row.
The function used in the example (dense_rank
) returns the rank of the current row according to the supplied OVER(ORDER BY)
clause—that is, in turn, not to be confused with a regular ORDER BY
.
orafaq.com has a nice intro to Oracle analytic functions.
The fist step is to fetch the smallest group_key
. Because of the min/max optimization in combination with a well supporting index, the database doesn’t need to sort the data—it just picks the first record from the index which must be the smallest anyway. The second step is to perform a regular index lookup for the start_key
and the group_key
that was just retrieved from the sub-query.
Another possible implementation for that is to use an analytic function:
select * from ( select id, start_key, group_key, dense_rank() OVER (order by group_key) rnk from demo where start_key = 'St' ) where rnk <= 1;
Do you recognize the pattern? It is very similar to the traditional Top-N query that was described at the beginning of this article. Instead of limiting on the rownum
pseudocolumn we use an analytic function. The execution plan reveals the performance characteristic of that statement:
ID START_KEY GROUP_KEY RNK ---------- ---------- --------- ---------- 936196 St 1 1 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3221234897 ----------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 370 | 62160 | 374 | |* 1 | VIEW | | 370 | 62160 | 374 | |* 2 | WINDOW NOSORT STOPKEY | | 370 | 4440 | 374 | | 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 370 | 4440 | 373 | |* 4 | INDEX RANGE SCAN | DEMO_IDX | 370 | | 3 | ----------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNK"<=1) 2 - filter(DENSE_RANK() OVER ( ORDER BY "GROUP_KEY")<=1) 4 - access("START_KEY"='St') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 610 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
What was the COUNT STOPKEY
operation for the traditional Top-N query has become the WINDOW NOSORT STOPKEY
operation for the analytical function. However, the expected number of rows is not known to the optimizer—any number of rows could have the lowest group_key
value. Still the index range scan is aborted once the required rows have been fetched. On the one hand, the consistent gets are even better as with the sub-query statement. On the other hand, the cost value is higher. Whenever you use analytic functions, go for a benchmark to know the actual performance.
Let’s have some thoughts about this optimization. The database knows that the index order corresponds to the OVER (ORDER BY)
clause and avoids the the sort operation. But even more impressive is that that it can abort the range scan when the first value that doesn’t match the rnk <= 1
expression is fetched. That is only possible because the dense_rank()
function can not decrease if the rows are fetched in order of the OVER(ORDER BY)
clause. That’s impressive, isn’t it?
Mass Top-N Queries
The next step towards the issue that made me writing this article is to make a mass Top-N query. With the previous statement as basis, it is actually quite simple; just remove the inner where clause to get the result for all start_key
values and add a partition clause to make sure the rank is built individually for each start_key
:
select * from ( select start_key, group_key, junk, dense_rank() OVER (partition by start_key order by group_key) rnk from demo ) where rnk <= 1;
Declaring the partition is required to make sure those start_keys
that don’t have a group_key
of one will still show up, with their lowest group_key
value.
With that query, we have reached the end of the optimizers smartness—as of release 11r2. On the first sight, the plan is not surprising:
3260 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1766530486 -------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | -------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000K| 340M| 8239 | |* 1 | VIEW | | 1000K| 340M| 8239 | |* 2 | WINDOW SORT PUSHED RANK| | 1000K| 198M| 8239 | | 3 | TABLE ACCESS FULL | DEMO | 1000K| 198M| 8239 | -------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNK"<=1) 2 - filter(DENSE_RANK() OVER ( PARTITION BY "START_KEY" ORDER BY "GROUP_KEY")<=1) Statistics ---------------------------------------------------------- 22 recursive calls 20 db block gets 30370 consistent gets 33163 physical reads 0 redo size 59215 bytes sent via SQL*Net to client 2806 bytes received via SQL*Net from client 219 SQL*Net roundtrips to/from client 0 sorts (memory) 1 sorts (disk) 3260 rows processed
It’s a full table scan. However, a “mass” query performs a full table scan on good purpose—that did not call my attention. What did call my attention is the following:
select * from ( select * from ( select start_key, group_key, junk, dense_rank() OVER (partition by start_key order by group_key) rnk from demo ) where rnk <= 1 ) where start_key = 'St';
It is actually the individual Top-N query again. This time it is built on the basis of the mass Top-N query—that was set up as view. That way, a single database view can be used for any mass query as well as for individual Top-N queries—that’s a maintainability benefit. If the advanced magic to abort the index range scan is still working it would be extremely efficient as well. The execution plan proves the opposite:
START_KEY GROUP_KEY JUNK RNK ---------- --------- ---------- ---------- St 1 junk 1 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1309566133 ----------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 370 | 128K| 373 | |* 1 | VIEW | | 370 | 128K| 373 | |* 2 | WINDOW NOSORT | | 370 | 76960 | 373 | | 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 370 | 76960 | 373 | |* 4 | INDEX RANGE SCAN | DEMO_IDX | 370 | | 3 | ----------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNK"<=1) 2 - filter(DENSE_RANK() OVER ( PARTITION BY "START_KEY" ORDER BY "GROUP_KEY")<=1) 4 - access("START_KEY"='St') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 363 consistent gets 0 physical reads 0 redo size 808 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Although no sort is required, the STOPKEY
has disappeared from the WINDOW NOSORT
operation. That means that the full index range scan will be performed; for all 359 rows where start_key='St'
. On top of that, the number of consistent gets is rather high. A closer look into the execution plan reveals that the entire row is fetched from the table before the filter on the analytic expression is applied. The junk column that is fetched from the table is not required for the evaluation of this predicate; it would be possible to fetch that column only for those rows that pass the filter.
The “premature table access” is the reason why the full table scan is more efficient for the mass query than a index full scan. Have a look into the (hinted) full index scan execution plan for the mass query:
3260 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1402975529 ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1000K| 340M| 1002K | |* 1 | VIEW | | 1000K| 340M| 1002K | |* 2 | WINDOW NOSORT | | 1000K| 198M| 1002K | | 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 1000K| 198M| 1002K | | 4 | INDEX FULL SCAN | DEMO_IDX | 1000K| | 2504 | ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNK"<=1) 2 - filter(DENSE_RANK() OVER ( PARTITION BY "START_KEY" ORDER BY "GROUP_KEY")<=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 1002692 consistent gets 817172 physical reads 0 redo size 59215 bytes sent via SQL*Net to client 2806 bytes received via SQL*Net from client 219 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3260 rows processed
The expensive step in this execution plan is the table access. If the table access would be moved up to take place after the window filter, the cost for this step would be 3260 (one for each fetched row). The total cost for the plan would probably stay below 7000; that is, lower then the cost for the full table scan plan.
Conclusion
Just to re-emphasize the motivation behind the view that can serve both needs; it is about multidimensional optimization—and that has nothing to do with OLAP!
Typically, performance optimization takes only one dimension into account; that is, performance. So far so good, but what about long term maintenance? Very often, performance optimization reduces the maintainability of the software. That’s not a coincidence, it’s because maintainability is the only degree of freedom during optimization. Unfortunately, reduced maintainability is very hard to notice. If it is noticed at all, it is probably years later.
I have been in both worlds for some years—operations and development—and try to optimize for all dimensions whenever possible because all of them are important for the business.
Create and Insert Statements
To try it yourself:
create table demo ( id number not null, start_key varchar2(255) not null, group_key number not null, junk char(200), primary key (id) ); insert into demo ( select level, dbms_random.string('A', 2) start_key, trunc(dbms_random.value(0,1000)), 'junk' from dual connect by level <= 1000000 ); commit; exec DBMS_STATS.GATHER_TABLE_STATS(null, 'DEMO');
My tests were conducted on 11R2.