Oracle SQL: Optimizing a WHERE predicate inequality with a Function Based Index

Intro

Recently, on Oracle 11.2.0.3, I saw a join of three tables, using three times TABLE ACCESS FULL and HASH JOIN, without an obvious reason. During separating and understanding its components, the following scenario turned out to be the culprit.

select *
 from CARDTEST
 where OBJECT_ID<DATA_OBJECT_ID;

The optimizer is unable to calculate the cardinality of the “OBJECT_ID<DATA_OBJECT_ID“ predicate, and thus, it ends up with underestimating the cardinality, and using a mass access path. (The “gather_plan_statistics” hint was added to see details as real buffer gets and does not change the behavior.)

Execution Plan 1

Execution Plan 1

Fighting a loosing battle(?)

So how does this come? Oracle statistics (even histograms) only store information about the distribution of values in a column. The optimizer can’t combine two column statistics in a way, that would allow to quantify “one is smaller than the other” over the whole table. The CBO can’t combine column statistics anyway, that’s why we have to use extended statistics. But in this case, we simply can’t create an extended statistic representing an inequality predicate.

Because after creating extended stats:

select dbms_stats.create_extended_stats(
  null, 
  'CARDTEST', 
  '(OBJECT_ID, DATA_OBJECT_ID)') 
from dual;

and collecting stats for the new virtual column:

exec dbms_stats.gather_table_stats(
 ownname=>'KLM',
 tabname=>'CARDTEST',
 method_opt=>'FOR ALL HIFDDEN COLUMNS SIZE SKEWONLY',
 estimate_percent=>10,
 cascade=>true,
 no_invalidate=>false);

and executing it three times (to give self-learning features like adaptive cursor sharing and cardinality feedback a chance) the optimizer still comes to the same conclusion (see execution plan above).

Alternative

What we need, is a way to calculate a combined cardinality of the values in the two columns. For all that’s known, we can’t support “where OBJECT_ID<DATA_OBJECT_ID” from a schema approach. So let’s touch the SQL itself:

select *
from CARDTEST
where (OBJECT_ID-DATA_OBJECT_ID)<0;

From the logical point of view, this is the same as above. But there’s a big internal difference: Unlike an inequality predicate, we can support a subtraction by a function-based index. Similar to  combined statistic, a function-based index creates a virtual column with the function’s result.

And this is our approach: Creating a cardinality estimate for (OBJECT_ID-DATA_OBJECT_ID) and compare the result with zero in the WHERE condition, allows the CBO to successfully guess the number of matching rows. All we have to do – beyond changing the SQL, which might be hard enough – is:

create index I_FB_CT_OID_DOID 
  on CARDTEST 
  (OBJECT_ID-DATA_OBJECT_ID);

and add statistics for the virtual column:

exec dbms_stats.gather_table_stats(
  ownname=>'KLM',
  tabname=>'CARDTEST',
  method_opt=>'FOR ALL HIFDDEN COLUMNS SIZE SKEWONLY',
  estimate_percent=>10,
  cascade=>true,
  no_invalidate=>false);

Now the execution plan looks way friendlier:

Execution Plan 2

Execution Plan 2

And as we can see, “SYS_NC00016$“, the name of the virtual column added by our function-based index, was used to solve the inequality.

Conclusion

When we compare both plans, we see a TABLE ACCESS FULL with 3293 vs. a TABLE ACCESS BY INDEX ROWID with 261 buffer gets.

The A-Times are the same, because my test system had a huge buffer cache and cached storage, so we had no or very quick disk IO’s not affecting the response time here.

But the way to a solution as explained above has a big disadvantage: If we can’t change the SQL, we are doomed. I was not able to make a CARDINALITY hint (or an equal profile) work here.

Further implications

Miscalculating a cardinality here leads to a bad access path – bad enough. But even worse, I want to come back to my initial sentence. If this very predicate (OBJECT_ID<DATA_OBJECT_ID) defines the number of returned rows in the first of a couple of joins over multiple tables, you may easily end up with a plan with multiple full table scans, including multiple hash joins.

Reproducing the examples

The example above is real-life, but simplified. It should be easy to reproduce, since my test case is based on the Oracle dictionary view ALL_OBJECTS. This is what you can do yourself on 11.2.0.3.x.

-- create objects
drop table CARDTEST;

create table CARDTEST as
  select * from all_objects;
create index I_CT_OID
  on CARDTEST
  (OBJECT_ID);
create index I_CT_DOID
  on CARDTEST
  (DATA_OBJECT_ID);
create index I_CT_OID_DOID
  on CARDTEST
  (OBJECT_ID, DATA_OBJECT_ID);
create index I_FB_CT_OID_DOID
  on CARDTEST
  (OBJECT_ID-DATA_OBJECT_ID);
-- create extended stats (if you want)
select
  dbms_stats.create_extended_stats(
  null,
  'CARDTEST',
  '(OBJECT_ID, DATA_OBJECT_ID)')
from dual;
-- remove extended stats (if you want)
exec dbms_stats.drop_extended_stats(
  null,
  'CARDTEST',
  '(OBJECT_ID, DATA_OBJECT_ID)');
-- collect optimizer statistics (including hidden columns)
exec dbms_stats.gather_table_stats(
  ownname=>'KLM',
  tabname=>'CARDTEST',
  method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY',
  estimate_percent=>10,
  cascade=>true,
  no_invalidate=>false);
-- test with inequality
select /*+gather_plan_statistics*/ *
  from CARDTEST
  where OBJECT_ID<DATA_OBJECT_ID;
-- produce execution plan
select *
  from table(dbms_xplan.display_cursor(null,null,'COST IOSTATS LAST'));
-- test with subtraction
select /*+gather_plan_statistics*/ *
  from CARDTEST
  where (OBJECT_ID-DATA_OBJECT_ID)<0;
-- produce execution plan
select *
  from table(dbms_xplan.display_cursor(null,null,'COST IOSTATS LAST'));



You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Oracle SQL: Optimizing a WHERE predicate inequality with a Function Based Index”

  1. usn Says:

    FYI, reproduction also works in a 12.1.0.1 PDB.

    Regards
    Martin

  2. Mark W. Farnham Says:

    Since creating the function index
    create index I_FB_CT_OID_DOID
    on CARDTEST
    (OBJECT_ID-DATA_OBJECT_ID);

    relies on a known data distribution to be chosen by the CBO (and would be inefficient to be chosen if you reduce the inequality),
    perhaps the alternative index
    create index I_FB_CT_OID_DOID_NULL
    on CARDTEST
    ((case when OBJECT_ID=DATA_OBJECT_ID then null end));

    will be a useful addition to your test set. Both the resultant size of the created index and the number of buffer gets required by the solution of the query will be of interest, I suspect.

    Nice post, Martin.

  3. Mark W. Farnham Says:

    Rats. Where I actually typed “reduced” I meant “reduced.”

    (There is also an extra pair of () on the index definition but they don’t really hurt anything.)

  4. usn Says:

    @Mark: Thanks for your response. But could you explain “and would be inefficient to be chosen if you reduce the inequality” please? I can’t follow the thought yet.
    Thank you
    Martin

Leave a Reply