Monday, May 14, 2012

Take it or Leave it

When it comes to index advice, this is both the question, and the answer.  Let me explain...

Since V5R4, DB2 for i has been able to provide index advice for each and every query that is issued. The capability to offer this advice is autonomic - meaning you do nothing to switch it on or request it, it just happens. Nice.

The index advice is placed in a public table: QSYS2.SYSIXADV
The index advice is always available 24/7.
The detailed or condensed information can be accessed via System i Navigator, or via your own custom query.





Unfortunately there continues to be confusion and consternation surrounding the index recommendations provided by the query optimizer. In my opinion, this is due in part to:

a) lack of understanding

b) shirking of responsibility

c) blind faith

d) all of the above


While DB2 for i is built to work with little or no administration, our beloved database management system does require some proper relational design and engineering to be effective and efficient. This includes you getting involved with the indexing of tables in support of queries that will be issued against these tables.

To fully appreciate and comprehend an effective indexing strategy, one must fully appreciate and comprehend the DB2 for i query optimizer and database engine. Obtaining good education and training, supported by practice, is key to your success.

DB2 for i does a lot of clever and cool things re-actively to help your SQL requests behave and run better. As your query is running, DB2 for i has the ability to observe, learn and adapt. This includes taking it's own index advice and building temporary data structures. But the system is limited in what it can do pro-actively. That is to say, there is only so much the optimizer can do to understand the data and the course of the query before it's executed. This is where you come in.

When finally discovering the index advisor, some folks seem disappointed. I continue to hear a lot complaints and questions such as:

"I already have an index like this, why was it advised again?"

(Unless the existing index attributes matches exactly, a new index can be advised)

"I created the index and it didn't get used, why not?"

(The index was used for statistics, but the selectivity of the query resulted in the index not being used for implementation)

"I was told by IBM to create an index, but that index was not advised, why not?"

(The index advisor has some limitations, thus some indexing opportunties are not recognized)

So, how does it work?

Simply put, at run time the query optimizer is looking at the syntax of the query and any other relevant information available concerning the target tables and the data within. Using this information, the optimizer decides if an index would be beneficial. This is a prediction! Think crystal ball.

Indexes are used by DB2 for i to understand the data and used by the database engine for implementation. In other words, indexes can be a source of statistics and to facilitate reference for accessing, joining, grouping, ordering data.

By advising an index for a particular query, DB2 is assuming the index will be helpful to better understand the data, and that the index might be advantageous when used for implementation. Obviously, until the index is in place, the optimizer really doesn't know.

The key (pun intended) to understanding an adequate indexing strategy is to understand your data and the SQL request. Specifically: column cardinality and query selectivity.

Cardinality represents the number of distinct values in a set. For column COLOR, cardinality defines the number of different colors in the table.

Selectivity represents the focus or target of the query. Given a local selection predicate COLOR = 'PURPLE', how many rows in the table are identified by this condition.

Indexes help the optimizer understand both cardinality and selectivity. Furthermore, if the column cardinality is high (i.e. a relatively large number of different values) and the query selectivity is high (i.e. a relatively small number of rows of interest), an index will be very beneficial for accessing the data fast and efficiently.

I know it sounds silly to say, but... without such an index in place, the query optimizer has no opportunity to make use of said index. In other words, the index must exist BEFORE the query is issued for the benefit to be realized. Now you can imagine where index advice comes into play. The optimizer is assisting; nudging you towards a better environment for performance and scalability.

For the SQL query engine, DB2 for i provides very good index advice. The key columns can be based on all parts of the query... local selection, join, group by, order by. The most important in my opinion is the local selection predicates and the join predicates. Applying the local selection and joins usually involve I/O. Doing unnecessary reading and processing of data makes for tardy results.

In general, the sweet spot for index advice is based on columns specified with equal operators that are ANDed together; with local selection predicates first, followed by join predicates. This arrangement allows the index probe operation to target multiple (key) columns, offering the best and most efficient access to the key and the row. No matter the join position of any table. In other words, it minimizes or eliminates unnecessary I/O - which by the way, is what query optimization is all about!

Where's my index?

There are some limitations to the index advice that is provided. This I'm sure adds to the confusion. While indexes can be created for the following situations, there is no advice provided for:

Different columns ORed together  (i.e. COL1 = 'A' OR COL2 = 'B')
Derived columns  (i.e. UPPER(NAME) = 'WHO DAT')
Index only access  (i.e. all columns needed by the query are represented in the index)
EVI INCLUDE  (i.e. aggregates included in the EVI data structure)

For queries using these types of predicates, the database engineer will need to apply their own science and art in pursuit of the most appropriate index.

Here are some very simple examples to illustrate the point:

select *
from a_schema.a_table
where order_number = 123;

index advised
create index ... on a_schema.a_table (order_number);

select some_column
from a_schema.a_table
where color = 'RED'
and size in ('MEDIUM', 'LARGE');

index advised
create index... on a_schema.a_table (color, size);

select a.some_column
from a_schema.a_table a
inner join a_schema.b_table b
on a.jcol1 = b.jcol1
where a.order_number = 789;

index advised
create index... on a_schema.a_table (order_number, jcol1);
create index... on a_schema.b_table (jcol1);

select some_column
from a_schema.a_table
where col1 = 'ABC' or col2 = 'XYZ';

no index advised

create index... on a_schema.a_table (col1);
create index... on a_schema.a_table (col2);

select some_column
from a_schema.a_table
where upper(name) = 'CAIN';

no index advised
create index... on a_schema.a_table (upper(name));

select year, month, sum(revenue)
from a_schema.a_table
group by year, month
order by year, month;

no index advised
create encoded vector index... on a_schema.a_table (year, month) include(sum(revenue);

Take it or leave it

Should you take all the index advice provided by the optimizer? Probably not. While the optimizer will continue to recommend an index every time the query is executed, just ignore the advice. On the other hand, if you are not satisfied with the query performance, create the most often recommended indexes and analyze your results. Acting on the index advice is subjective. Your response time and performance goals are relevant.

Should you analyze your queries and their respective response times to identify the longest running and/or least efficient operations? Indeed!

Should you eliminate unnecessary table scans and temporary index builds? Most definitely!

The DB2 for i index advisor can help guide the way.

Pro-actively you can ensure that every join column in your data model is covered by an index. It is a good practice to index columns associated with commonly used, highly selective local predicates. These are critical success factors.

If you are considering the use of EVIs, get educated, and be sure you understand when and where they are most effective. Creating encoded vector indexes represents an advanced form of indexing.

For more information about DB2 for i indexing strategies, check out our white paper here.

No comments:

Post a Comment