Intro
Recently, on Oracle 11.2.0.3, I saw a join of three tables, using three times TABLE ACCESS FULL and HASH JOIN, without an obvious reason. During separating and understanding its components, the following scenario turned out to be the culprit.
select * from CARDTEST where OBJECT_ID<DATA_OBJECT_ID;
The optimizer is unable to calculate the cardinality of the “OBJECT_ID<DATA_OBJECT_ID“ predicate, and thus, it ends up with underestimating the cardinality, and using a mass access path. (The “gather_plan_statistics” hint was added to see details as real buffer gets and does not change the behavior.)
Fighting a loosing battle(?)
So how does this come? Oracle statistics (even histograms) only store information about the distribution of values in a column. The optimizer can’t combine two column statistics in a way, that would allow to quantify “one is smaller than the other” over the whole table. The CBO can’t combine column statistics anyway, that’s why we have to use extended statistics. But in this case, we simply can’t create an extended statistic representing an inequality predicate.
Because after creating extended stats:
select dbms_stats.create_extended_stats( null, 'CARDTEST', '(OBJECT_ID, DATA_OBJECT_ID)') from dual;
and collecting stats for the new virtual column:
exec dbms_stats.gather_table_stats( ownname=>'KLM', tabname=>'CARDTEST', method_opt=>'FOR ALL HIFDDEN COLUMNS SIZE SKEWONLY', estimate_percent=>10, cascade=>true, no_invalidate=>false);
and executing it three times (to give self-learning features like adaptive cursor sharing and cardinality feedback a chance) the optimizer still comes to the same conclusion (see execution plan above).
Alternative
What we need, is a way to calculate a combined cardinality of the values in the two columns. For all that’s known, we can’t support “where OBJECT_ID<DATA_OBJECT_ID” from a schema approach. So let’s touch the SQL itself:
select * from CARDTEST where (OBJECT_ID-DATA_OBJECT_ID)<0;
From the logical point of view, this is the same as above. But there’s a big internal difference: Unlike an inequality predicate, we can support a subtraction by a function-based index. Similar to combined statistic, a function-based index creates a virtual column with the function’s result.
And this is our approach: Creating a cardinality estimate for (OBJECT_ID-DATA_OBJECT_ID) and compare the result with zero in the WHERE condition, allows the CBO to successfully guess the number of matching rows. All we have to do – beyond changing the SQL, which might be hard enough – is:
create index I_FB_CT_OID_DOID on CARDTEST (OBJECT_ID-DATA_OBJECT_ID);
and add statistics for the virtual column:
exec dbms_stats.gather_table_stats( ownname=>'KLM', tabname=>'CARDTEST', method_opt=>'FOR ALL HIFDDEN COLUMNS SIZE SKEWONLY', estimate_percent=>10, cascade=>true, no_invalidate=>false);
Now the execution plan looks way friendlier:
And as we can see, “SYS_NC00016$“, the name of the virtual column added by our function-based index, was used to solve the inequality.
Conclusion
When we compare both plans, we see a TABLE ACCESS FULL with 3293 vs. a TABLE ACCESS BY INDEX ROWID with 261 buffer gets.
The A-Times are the same, because my test system had a huge buffer cache and cached storage, so we had no or very quick disk IO’s not affecting the response time here.
But the way to a solution as explained above has a big disadvantage: If we can’t change the SQL, we are doomed. I was not able to make a CARDINALITY hint (or an equal profile) work here.
Further implications
Miscalculating a cardinality here leads to a bad access path – bad enough. But even worse, I want to come back to my initial sentence. If this very predicate (OBJECT_ID<DATA_OBJECT_ID) defines the number of returned rows in the first of a couple of joins over multiple tables, you may easily end up with a plan with multiple full table scans, including multiple hash joins.
Reproducing the examples
The example above is real-life, but simplified. It should be easy to reproduce, since my test case is based on the Oracle dictionary view ALL_OBJECTS. This is what you can do yourself on 11.2.0.3.x.
-- create objects drop table CARDTEST; create table CARDTEST as select * from all_objects;
create index I_CT_OID on CARDTEST (OBJECT_ID);
create index I_CT_DOID on CARDTEST (DATA_OBJECT_ID);
create index I_CT_OID_DOID on CARDTEST (OBJECT_ID, DATA_OBJECT_ID);
create index I_FB_CT_OID_DOID on CARDTEST (OBJECT_ID-DATA_OBJECT_ID);
-- create extended stats (if you want) select dbms_stats.create_extended_stats( null, 'CARDTEST', '(OBJECT_ID, DATA_OBJECT_ID)') from dual;
-- remove extended stats (if you want) exec dbms_stats.drop_extended_stats( null, 'CARDTEST', '(OBJECT_ID, DATA_OBJECT_ID)');
-- collect optimizer statistics (including hidden columns) exec dbms_stats.gather_table_stats( ownname=>'KLM', tabname=>'CARDTEST', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY', estimate_percent=>10, cascade=>true, no_invalidate=>false);
-- test with inequality select /*+gather_plan_statistics*/ * from CARDTEST where OBJECT_ID<DATA_OBJECT_ID; -- produce execution plan select * from table(dbms_xplan.display_cursor(null,null,'COST IOSTATS LAST'));
-- test with subtraction select /*+gather_plan_statistics*/ * from CARDTEST where (OBJECT_ID-DATA_OBJECT_ID)<0; -- produce execution plan select * from table(dbms_xplan.display_cursor(null,null,'COST IOSTATS LAST'));