Wednesday, August 8, 2012

Quantitative vs Qualitative Information

Are we there yet?

If I ask you to tell me the fastest and most efficient transportation method for traveling between point A and point B, say Rochester Minnesota to Chicago Illinois, what would you say?

More importantly, what would you base your answer on?

Assume the 5 modes of transportation I have available to me are:

  1. feet
  2. bicycle
  3. automobile
  4. train
  5. plane

Thinking through some of the information you will require to determine the best method, the relative cost and speed of each mode are key factors. The distance between point A and point B is very important as well.  No one would choose a train or plane to travel a few blocks to the local supermarket, yet walking, biking or driving are all viable and reasonable modes of transportation for this short distance.

Given you know the costs and top speed of the various transportation modes above, what you don't know is the distance between Rochester Minnesota and Chicago Illinois. Now, as quickly as you can, tell me the fastest method to travel from point A to point B.

Go Ahead - Take a Guess

Since you have never traveled between Rochester and Chicago, and no one has provided you with any information on distance and terrain, you'll make an educated guess (or make up a number). Maybe you'll take a look at a map. Hmm, Rochester and Chicago don't look very far apart. Hmm, it could be a nice ride - cheap too.

So you tell me to bicycle...    ...the 350 miles (563 kilometers) from Rochester to Chicago.

What went wrong?

And more importantly, how do you fix it?  ('cause I'm not happy, my butt legs hurt).

Give Me Better Information!

If you had a better estimate of the distance, or the actual distance from a previous trip, your choice would be different would it not? The plane, train or automobile make the list of realistic options; biking and walking do not.

To carry the scenario a bit further (pun intended) let's assume you choose the train.  And there are no tracks between Rochester and Chicago.  To make use of the train we would first have to lay the track.  Now it seems "taking the train method" is no longer very fast or efficient, regardless of how fast the train can travel along.  But if the tracks are already in place, we have the ability to use them immediately - no waiting.  And because someone actually had to construct the railroad from Rochester to Chicago, the track distance is codified and remembered.

What's the Point?

When you issue an SQL query, the DB2 for i Optimizer must make a decision on how best to access the data and process the data. Think of this as a trip, and we need a plan. A plan that includes the best route and the best mode of transportation. Just like the trip from Rochester to Chicago, the relative distance matters. The Optimizer relies on meta-data and statistics to make good decisions. The availability of information and the quality of that information will go a long way in determining the fastest possible plan for your query, with your data, on your system.

In keeping with the philosophy of easy to own, easy to use, DB2 for i provides many self tuning and self managing features. One such feature is in this area of database column statistics.

Since V5R2 and the introduction of the new SQL Query Engine, DB2 for i has automatically collected, managed and used column statistics to provide the Optimizer with information. Column (value) cardinality, distribution and most frequent occurrences can be gleaned from the statistics stored in the database table object. The automatic collection of column statistics is enabled (i.e. turned on) by default. The actual collection of information is performed by a very low priority job running in the background. The process is designed to not interfere with the normal work occurring in the system, but rather take advantage of idle resources to learn more about your data. Collections are controlled by the system value QDBFSTCCOL.

Turning off the automatic column statistics collection mechanism is NOT recommended, as this will remove valuable and insightful information from the query Optimizer's line of sight, and possibly  make you pedal the 350 miles to Chicago.

John Henry says "Indexing Strategy First and Foremost"

The number one SQL query tuning tip is to define and implement a proper indexing strategy. This is roughly equivalent to laying down the railroad tracks BEFORE you need to take the train. Indexes have always provided the Optimizer with statistics on key column contents. Thus having the appropriate indexes in place is a side benefit - even if the index is not used for implementation of the query.

More in depth and detailed information on the science and art of DB2 for i query optimization is covered in our workshops.

If you need immediate assistance with SQL query performance or anything related to IBM i information management, contact me.

Travel safe. Enjoy the trip.

No comments:

Post a Comment