Monday, August 17, 2015

The Importance of Understanding Your Data

I originally published this article in the SAP for IBM i newsletter produced by Kolby Hoelzle, Practice Leader - SAP on IBM i Center of Excellence.

If you want to know more about the advantages of running SAP in IBM i, please reach out to Kolby at: hoelzle @


The Importance of Understanding Your Data 

Your repositories of data are growing. The transactions and events that produce that data are increasing. The requirement to extract more value from the data is not going away. Thus, the importance of understanding your data and how it is being used is paramount.

Understanding the data and its life cycle is traditionally the responsibility of the database engineering team. Unfortunately, IBM i installations usually do not have a designated or dedicated DB2 for i database engineer. This puts the organization at a disadvantage. The good news is that this situation can be remedied relatively easily. For more information about the DB2 for i Database Engineer position, look here.

The process of learning more about data and the SQL statements accessing that data are accomplished through profiling. Proactive data profiling is better than reactive data profiling. Data profiling involves capturing and analyzing the metadata associated with the DB2 for i database objects as well as the SQL statements issued against those objects. Metadata is defined as “data about data”.

Some of the more interesting and useful database object attributes (i.e. metadata) to understand include table size, number rows (active and deleted), index size, column value cardinality and distribution. Profiling and analyzing this information includes identifying the top n largest tables and largest indexes, as well as their respective growth rates. While not well known, there are limits to the size of DB2 table. If the limit is reached, no more data can be put into the table, and needless to say, your business transactions stop. Knowing the limits, and more importantly understanding the rate at which you are approaching the limits, is how you avoid catastrophe.

A good query execution plan is dependent upon the query optimizer understanding the data in the tables that are referenced. Local selection method, join technique and grouping scheme are the best they can be when column value cardinality and distribution are well understood. For the database engineer to understand and accept the query optimizer’s decisions, he or she must also understand the data. For example, does a particular column value represent a small number of rows in the data set, or a large number of rows in the data set?

Comparing metadata points and object attributes over time provides trending information. This type of information allows for some very interesting and meaningful analysis; and if done proactively, can assist the business and technical leaders with planning and forecasting. To accomplish the trending analysis, the information captured from the DB2 for i database objects and SQL statements must be stored in a structured and organized way. In other words, the various metadata points must be available over time to facilitate comparisons, calculate deltas and connect the dots to produce a graph or trend line. This means capturing the metadata on a scheduled and frequent basis and keeping the data over time for reporting.

Given that the target database housing data from business transactions resides in the live “production” system, the techniques and methods used to capture and store the DB2 for i metadata need to be fast, efficient and concurrent. Harvesting data about data cannot interfere with the critical operations. On the other hand, if you do not understand the data that represents your business, what’s the point of capture and analysis?

A suitable architecture for capturing, storing, analyzing and reporting on the metadata looks a lot like business intelligence architecture. As such, the data model, ETL and analysis mechanisms must be well thought out, designed and implemented properly. This includes using a separate (and likely smaller) system to hold the data and provide a platform for query, analysis and reporting engines.

The results of basic analysis and trending calculations are reviewed, augmented and used by the DB2 for i Database Engineering team. The goal is to assist business leaders in maintaining a viable and valuable data centric environment. In essence, this answers the question: who should understand the data? Furthermore, it is the responsibility of the database engineers to formulate a plan of action to resolve any issues that appear on the horizon BEFORE they become critical. 

While all of the capturing, analyzing and reporting on DB2 for i metadata seems intriguing and valuable, there is still the work involved to get the proper process and procedures in place. It just so happens that the IBM DB2 for i Center Excellence team has a solution available that can be used by the database engineers to facilitate the understanding of data.

For more information on the DB2 for i metadata reporting solution and/or the importance of the DB2 for i Database Engineer, please reach out to me.


No comments:

Post a Comment