Friday, March 29, 2013

More Advice! Is this Good OR Bad?


With the arrival of IBM i TR6 we find the DB2 for i Query Optimizer enhanced to provide more index advice. In a previous post I discussed the philosophy behind generating index advice during the SQL query optimization process.  I also called out the limitations in the advice provided. One limitation of note involves no advice given for different columns ORed together, such as:

SELECT    *
FROM       NFL_TEAMS
WHERE    COLOR = 'PURPLE' OR EATS_PEOPLE = 'YES';

If you've attended our DB2 for i SQL Performance class, you know that multiple indexes over the same table can be used to assist the database engine with identifying rows of interest. This technique is known as "index ORing". But without advice from optimization, you are on your own to determine the appropriate indexes to create.

Keep in mind that when ORing the same column together, the optimizer has always been able to produce advice for queries such as:

SELECT    *
FROM        NFL_TEAMS
WHERE     COLOR = 'BLACK' OR COLOR = 'BLUE'


With the latest database group PTFs installed, the query optimizer will now advise indexes for up to five ORed predicates involving different columns. Additional details can be found here.

If your query has more than five columns ORed together, there will be no index advice produced (had to draw the line somewhere). Besides, ORing five different columns together is likely to result in most of rows in the table being selected anyway - in which case a full table scan is the best option, who knows...

There are additional limitations when mixing ANDed and ORed predicates in the query as well.


Some Examples

 

The following query now produces index advice (two separate indexes). Indexes that can be used to understand the data, as well as used for implementation:

select   *
from    orders
where  orderpriority = '5-LOW'
OR      shipmode = 'AIR';




Notice that the next query produces index advice consisting of one index (this is business as usual):

select    *
from     orders
where   orderpriority = '5-LOW'
AND    shipmode = 'AIR';






When more than five different columns are ORed together, no advice is produced:

select    count(*)
from     orders
where   orderpriority = '5-LOW'
OR       shipmode = 'AIR'
OR       quantity = 1
OR       year = 2013
OR       shippriority = 0
OR       orderkey = 1
OR       custkey = 10;






As mentioned previously, when mixing ANDed and ORed predicates in the query there are still limitations in the advice that can be generated. This is due in part to the number of key combinations that could be produced. In the following query, the optimizer only gives advice for the same column(s) appearing on both sides of the OR condition (i.e. YEAR). It is not going to advise two separate indexes.

select    *
from    orders
where  (year = 2013 and shipmode = 'AIR')
OR      (year = 2013 and orderpriority = '5-LOW');





I'll let you figure out why the optimizer only advises one index in the next case, and why it contains columns (YEAR, ORDERPRIORITY, SHIPMODE):

SELECT *
FROM   orders
WHERE  (year = 2013
        AND month = 1
        AND ((orderpriority = '5-LOW'
              AND shipmode = 'AIR')
             OR (orderpriority = '1-URGENT'
                 AND shipmode = 'TRUCK')))
       OR (year = 2012
           AND quarter = 4
           AND ((orderpriority = '5-LOW'
                 AND shipmode = 'AIR')
                OR (orderpriority = '1-URGENT'
                    AND shipmode = 'TRUCK')));






Join In


The new index advice for ORs also works with JOIN conditions (and of course, when mixing ANDs with ORs, the same limitations apply):

SELECT    o.*
FROM       orders AS o
INNER JOIN
                  dates AS d
ON           (o.shipdate = d.datekey OR o.receiptdate = d.datekey)
WHERE    d.year = 2013
AND         d.month = 1;





Words of Wisdom


Please reread my post: Index Advice - Take it or Leave it.  Read it again, and take it to heart. Analyzing SQL queries and tuning them via indexing techniques is both science and art. For example, should you create a radix index, or should you create an encoded vector index? I don't know, it depends. Obviously, seeking out the dependencies is important.

Knowledge, understanding and experience are required to really make sense of optimization and SQL query behavior. Getting educated and trained is the best and most important first step.

And rest assured that DB2 for i is working hard to optimize your queries to allow the best possible performance with your data, in your environment. This includes taking its own advice and creating the indexes for you...

No comments:

Post a Comment