Thursday, April 15, 2021

Latest IBM i TR adds Query Supervisor to Db2 & Your Team!

The latest IBM i Technology Refresh was announced this week which means there are new Db2 for i features and functions available to you. This latest announcement includes Db2 Mirror for i enhancements, SQL advancements, and new & updated IBM i services – however, the Query Supervisor is by far the most interesting new bell and whistle.

I like to remind IBM i clients that when they start using SQL they’re actually extending the size of their development team. Your team grows in size because the Db2 for i SQL engine features a Query Optimizer whose job it is to determine the fastest way to access and retrieve the data. Your development team is able to focus on the coding of business logic and leave the performance logic to the Query Optimizer. This latest announcement adds another member to your team with the introduction of the Query Supervisor. 

The new Query Supervisor allows you to take query governance and monitoring to another level. If you’ve been around the platform for a while, you might be thinking that Db2 for i already supports query governance and you would be right.  The Db2 for i support for a Query Governor goes way back to V3R1.

The Query Governor is known as a predictive query governor because it allows action to be taken before a query starts running. The governor allows you to prevent a query from wasting system resources when the query’s runtime implementation is predicted to exceed the time and/or temporary storage thresholds that you’ve defined for your server. The predictive aspect means that the Db2 for i Query Optimizer had to estimate the amount of time and resources that it takes to run the requested query.

As you might imagine, generating an accurate estimate is a tall task. One of the biggest hurdles is that the optimizer has no idea how busy the system will be when the query starts running. A CPU intensive workload could start running milliseconds after the current query starts and there’s no easy to way to predict this. Thus, the optimizer estimates are not going to be one hundred percent accurate. The optimizer’s estimates are relatively accurate meaning that long-running queries will have larger estimates than small-running queries. 

This “relative” accuracy of the optimizer’s estimates made it difficult to use the predictive query governor function system-wide. If you set a query time limit of 5 minutes on the system, a query that runs closer to two minutes might be prevented from running because the optimizer’s estimate was 6 minutes. A query that was estimated to run in 3 minutes might actually run for 7 minutes due to the system being busier than predicted. As a result of these challenges, clients have asked IBM for additional query workload controls and IBM has delivered on that request with the new Query Supervisor feature.

The Query Supervisor provides a solution for taking action on an active query whose actual resource usage exceeds a user-defined threshold. When the Db2 engine detects that your threshold has been exceeded, it will call a user-defined exit program to take action. That action might be cancelling the query or just logging it and notifying an administrator about the offending query; the possibilities are endless. If the exit program chooses to cancel the query, then one could argue that system resources were wasted by a query that didn’t complete. However, the upside is that you were able to limit the amount of time that the query caused overhead on your system.  Like the Query Governor, the Query Supervisor supports SQL and non-SQL queries (OPNQRYF, etc.).

The new Query Supervisor support allows you to set the following threshold types with a simple call to an SQL procedure - QSYS2.ADD_QUERY_THRESHOLD.

  • ELAPSED TIME
  • TEMPORARY STORAGE
  • CPU TIME
  • TOTAL IO COUNT

The threshold value is dependent on the type of threshold. The threshold value for Elapsed Time and CPU Time would be specified in seconds while the value specified for a Temporary Storage threshold would be given in megabytes. The Total IO Count threshold value is just a count because it specifies a limit on the number I/O operations that a query can perform.

With the Query Supervisor, you can also specify filters to narrow the focus of the supervisor’s threshold monitoring. The threshold filters that can be applied are subsystem names, job names, and user profile names to include or exclude. The values for each of these thresholds can be up to 100 names and can include generic names (eg, ‘RPTUSER*’). 

Here’s an example of a threshold definition to prevent QZDASOINIT jobs from having queries that run longer than 5 minutes, unless the queries were submitted by a manager. Notice in this example that you’re able to specify multiple threshold filters on the definition.

CALL QSYS2.ADD_QUERY_THRESHOLD (

     THRESHOLD_NAME      => 'QZDASOINIT Time Limit ',

     THRESHOLD_TYPE       => 'ELAPSED TIME ',

     THRESHOLD_VALUE   =>  300 ,

     JOB_NAMES                   =>  'QZDASOINIT',

     EXCLUDE_USERS         => 'MGRID*',

     SUBSYSTEMS                => '*ALL'  )

Thresholds can be easily removed by specifying the THRESHOLD_NAME on the QSYS2.REMOVE_QUERY_THRESHOLD procedure

The user-defined exit program needs to be registered to the new QIBM_QQQ_QRY_SUPER exit point. As mentioned earlier, the exit program allows you take a wide variety of actions. That flexibility is enhanced even further with the input values that are passed to the exit program. Here’s some of the more interesting input values to consider utilizing in your program logic:

        Threshold name, type & value

        User, Subsystem, and Job Info

        SQL Statement text, Plan Identifier & Host Variable values

        Client Register Values

To try out your new Query Supervisor team member, you just need to load the Database Group PTFs onto your IBM i 7.3 or 7.4 systems when they are released next month. 

Thursday, March 25, 2021

IBM i Services are Great, but they're NOT Magic

The development team in Rochester has been quite busy over the last several years making it easy to use SQL to access data and services related to the IBM i operating system. These IBM i Services can be used with SQL to perform a variety of requests; these requests range from retrieving data on active jobs & database connections to placing entries on data queues to checking object authorities to writing data to IFS files. I’m using the "IBM i Services" term here generically to refer to services that you call and to queries against system catalog views because many of the newer views utilize service calls to return data.   

The good news is that we’re seeing a lot of IBM i developers leverage these services to solve a number of problems. The bad news is we’re finding many developers using these services without any thought given to performance. Little consideration is given to how much data is being requested and how often the request is being made - it’s almost like they think there’s something magical about the performance of these services just because they are from IBM. While the IBM development team creating these services is really good, performance analysis and testing should be part of the rollout of any new functions using the IBM i Services. The same performance rigor that you would apply to the deployment of new business processes on your production systems needs to be applied to the deployment of code using the IBM i Services.

Our Db2 Lab Services team analyzes the Top 25 Most Time Consuming SQL statements as part of the SQL Performance Assessment service that we perform for clients.  In the last two years, we’re often finding SQL statements using IBM i Services in the list of most time consuming SQL. These time consuming SQL statements often have a negative impact on overall system performance because of their resource usage. A recent conversation with colleagues in IBM Support revealed they are seeing similar trends with clients reporting system performance problems, only to find a suboptimal services call as the root cause.

Here are some performance considerations when using IBM i Services with SQL.

First, not all services are designed to be called repeatedly on the system and/or to be called during daytime hours when system usage is heavy. Some services consume more system resources than others – that’s why performance testing is key to get a feel for how long the service takes to run and how many resources will be consumed by the service. The IBM Support team shared one story of a client complaining about system performance only to determine the client was calling the DISPLAY_JOURNAL service with very few input filters every two minutes on their production system. Repeated calls to other services like DATA_QUEUE_ENTRIES may be perfectly fine and that can be easily determined with performance testing and analysis. Performance analysis may also lead you to determine that the service needs to be called and run overnight when system usage is low.

Second, while the creation of indexes is often a solution to make SQL queries run faster & more efficiently  – this technique does not help with SQL statements using IBM i Services.  Indexes are not applicable because most IBM i Services rely on functions to dynamically generate the result data, so there’s no data to index. The function  logic is accessing & processing IBM i objects with system API calls. The operating system data being processed is not stored in tables and rows which are normally the target of an SQL request.

The lack of help from indexing leads to the last and probably most important performance consideration – ensure your IBM i Service calls are tightly focused.  A tight focus means specifying as many search parameters as possible on your calls to minimize the amount of data that has to be touched in order to return the result. For example, one could use the following query against the SYSPROGRAMSTAT catalog view to retrieve release details about an SQL program (MYSQLPGM)

            SELECT target_release, earliest_possible_release                      FROM qsys2.sysprogramstat

   WHERE program_name='MYSQLPGM'

While this query works, the scope is not focused – the request directs Db2 to search through every library containing an SQL program instead of scoping the search to the library containing the program.  The focus has been tightened on this new version of the SELECT statement resulting in the same result, but with better performance since the scope was narrowed to a single library:

SELECT target_release, earliest_possible_release               FROM qsys2.sysprogramstat

 WHERE program_name='MYSQLPGM' AND program_schema='MYPGMLIB'

Including search predicates on a WHERE clause is one way to narrow the focus of an IBM i Service.  The other method for narrowing focus is passing input parameters on the service call itself. Let’s use the OBJECT_STATISTICS service as an example. An audit requires you to provide a list of all the program & service program objects on your system. The following SELECT statement generates the required listing, but the scope of the service request is too broad requesting all libraries on the system be processed by the table function call.

SELECT objname, objlib, objtype

FROM TABLE(QSYS2.OBJECT_STATISTICS('*ALL', 'PGM SRVPGM'))

WHERE objlib NOT LIKE 'Q%'

While the IBM libraries are filtered out with the WHERE clause – that filtering is performed after the data is returned by the table function.  The amount of data processed by this call would be greatly reduced by changing the first input parameter from *ALL to *ALLUSR so that only user libraries are processed by the service.  Thus, eliminating the need for the WHERE clause.

SELECT objname, objlib, objtype

FROM TABLE(QSYS2.OBJECT_STATISTICS('*ALLUSR', 'PGM SRVPGM'))

 

Some IBM i services have options that reduce the amount of detailed data is collected and returned in order to speed up the performance.  The OBJECT_STATISTICS service has such an option, so this request can be made even more efficient by adding a third parameter with a value of *ALLSIMPLE. 

SELECT objname, objlib, objtype

FROM TABLE(QSYS2.OBJECT_STATISTICS('*ALLUSR', 'PGM SRVPGM', '*ALLSIMPLE' ))

 

With these guidelines in tow, you should be able to use IBM i Services in production without negatively impacting system performance.  You can also find a ton of good examples of using IBM i Services on  Scott Forstie’s SQL examples site:  ibm.biz/Db2foriSQLTutor  Let me know if your company needs assistance tuning your usage of IBM i Services.

Wednesday, February 17, 2021

The ABCs of Effective DB2 SMP Usage

Creation of the Db2 for i SMP (Symmetric Multiprocessing) licensed feature was one of the coolest projects that I was able to work on during my time on the Db2 development team. Figuring out how to add parallel processing into the Db2 engine was both interesting and challenging work. I did some searching through the InfoWorld archives (remember when hard copy IT periodicals were a thing…) and figured out that the SMP feature recently turned 25 years old in November 2020.  As the old saying goes, time flies when you’re having fun.

As part of that development project, I also had the opportunity to help one of the first customers use Db2 SMP in their shop. While the SMP feature and underlying hardware have changed over the years, the items to consider for a successful Db2 SMP implementation have not.

I’ve tried to break the success factors into the following ABC acrostic –might be a bit of a stretch, but it makes for a catchy title😉

·        Available system resources

·        Balanced expectations

·        Controlled usage of SMP

Available system resources

With Db2 for i SMP, the basic approach is dividing the work for a query across multiple threads and running those threads in parallel across multiple processor/cores to shorten the amount of time it takes to run the query (check out Mike’s nice graphic). The Db2 engine is using more system resources in order to reduce the overall amount of time it takes to run your query. 

Resource usage is being traded for time. That’s why reviewing the availability of system resources is a critical step to perform before buying and implementing Db2 SMP. If you don’t have the system resources to trade, then you’re not going to realize the performance benefits of Db2 SMP.  For example, if CPU utilization is currently running at 80-85%, adding Db2 SMP to use more system resources is not going to have a positive impact on system performance. 

The system resources also need to be balanced.  CPU resources are not the only system resource consumed by Db2 SMP.  Each thread used by Db2 SMP needs a chunk of memory to perform its segment of the query and that work can involve performing I/O on your database objects. As a result, your system needs sufficient memory and a properly sized I/O subsystem to support the increased CPU usage.  If the query optimizer finds that this combination of resources is not available, then the optimizer will not use parallel methods – even if you’ve installed and activated the Db2 for i SMP licensed feature.

Balanced expectations

Assuming you’ve determined that your system has a balanced set of resources available to support Db2 SMP parallel processing. The next step is setting the proper expectations on the type of database requests and workloads that may run faster with Db2 SMP. Some people tend to believe that DBb2 SMP’s parallel processing will be the silver bullet for all of their performance problems. 

Running a query with parallel processing adds overhead because there is work involved in dividing a query into multiple parts and distributing the work among threads.  This startup overhead means that Db2 SMP will not be a great benefit to short-running queries that are common in transactional workloads. Think about your own household - it’s not uncommon for younger kids to want to help parents with household chores, but often parents chose to do the chores themselves to avoid the overhead of involving and training their kids.  Your time would be better spent trying to tune short-running queries than hoping that parallel processing will magically improve performance.

Longer running queries are the best performance targets for Db2 SMP because they have a longer runtime which can quietly hide the startup overhead associated with parallel processing.  If Db2 SMP can reduce a long running query from 10 minutes to 5 minutes, no one is really going to notice that a hundred milliseconds was spent setting up threads for parallel processing. 

You might have noticed that I keep using queries as the parallel processing example. That is because Db2 SMP does not enable all database requests to use parallel processing.  Queries from SQL and non-SQL interfaces can use Db2 SMP, but native record-level access requests do not. Db2 SMP also does not support parallel inserts, updates, and deletes.  The only type of database change operation that can use Db2 SMP is Index Maintenance – however, this parallel processing is only done when the index updates are done as a result of a blocked Insert or write request. Db2 SMP can also utilize parallel processing to improve the performance of index creations and reorganize operations.

When setting expectations for the performance benefits, you need to make sure that everyone understands that it’s longer running queries that will be the primary benefactor from Db2 SMP and that not all database requests can use parallel processing.

Controlled usage of SMP

Once Db2 SMP has been installed on a system, it must be activated before the Db2 engine will consider using parallel processing on a request.  There are several different interfaces for enabling parallel processing which include: CHGQRYA CL command, QAQQINI PARALLEL_DEGREE option, SET CURRENT DEGREE statement or the QQQRYDEGREE system value.

Based on the discussion in the previous section, you should try to limit parallel processing enablement to only those jobs or requests that will benefit from Db2 SMP.  Enabling Db2 SMP for all requests just adds overhead to query optimization and can result in your system resources being overwhelmed if parallel processing is used. On a transaction-oriented system, you probably should scope parallel enablement to a limited set of requests and workloads from Db2 SMP. In contrast, you could cast a pretty wide parallel enablement on a data warehousing system which features longer-running queries.

In addition to figuring out which jobs and requests to enable parallel processing on, you should consider when to activate parallel processing. It could be that your server has high utilization of resources during the day, but resources to spare during off hours.  The enablement interfaces make it easy to turn Db2 SMP on or off.

In terms of which parallel degree value to use, I recommend starting with the *OPTIMIZE value.  With the *OPTIMIZE value, the Db2 optimizer tries to choose a degree of parallel processing that results in an implementation that is a good neighbor in terms of sharing system resources with other jobs.  A more cautious approach would be setting the QAQQINI PARALLEL_DEGREE option with a value of *OPTIMIZE 50.  This setting tells Db2 to use the good neighbor approach, but dial the parallel processing back by 50%. 

Hopefully, you now have a better understanding of when and how to use SMP.  If you still think that the Db2 for i SMP license feature may be a fit after reading this, our Lab Services team can provide a trial version of the feature for evaluation to help with the purchasing decision.  Also, our team is available to help you with Db2 performance tuning or teaching you how to tune whether it involves Db2 SMP or not – just contact me.

Wednesday, January 20, 2021

A New "Routine" Habit for the New Year?

 

Welcome to 2021! I hope that everyone’s new year is off to a good start.  A new year often brings discussion of regeneration and rebooting to start the year with a clean slate when it comes to developing good habits or dropping bad habits.

In the spirit of developing new habits for the new year, you may want to look into starting the habit of regular regeneration of your SQL routines.  When an SQL function, procedure, or trigger objects gets created, you may or may know that behind the scenes Db2 generates a C program with embedded SQL to implement the specified SQL procedural logic.  The efficiency of the program generation can have a direct impact on the runtime performance of your SQL routines.  As part of the code generation process, Db2 tries to implement some assignments and comparisons statements with pure C code to get the best performance.

Over time, the smart developers in Rochester have expanded Db2 for i’s ability to generate pure C code in more situations to speed up the performance of your SQL functions, procedures, and triggers.  However, your SQL routines can only benefit from the C code generation enhancements when they are recreated with the newer version of Db2 for i.  This article shows you can how you can query the system catalogs to identify those SQL routines that have not been recreated in a while.  Or you could just choose to recreate all of your SQL routines since that’s a relatively easy operation to try to see if it improves performance?

Hopefully, I’ve convinced you to add regular SQL routine recreation after Db2 for i updates to list of new habits to develop in the new year since it’s such a simple operation that may deliver faster performance.

Friday, December 11, 2020

Old Habits Die Hard, even QTEMP ones

As the old adage says, old habits die hard.  I have experienced this firsthand since returning to the IBM i world a couple of months ago.  While I was off working on Watson, IBM reached into its marketing bag of tricks and “changed” the name of its relational database product from DB2 to Db2.  If I had a nickel for every time that I’ve typed DB2 instead of Db2 since my return, I could use that money to take my wife out for a really nice meal; that assumes restaurants being open here in Minnesota, but I digress…

Another old habit that I’ve seen still being used in the IBM i world is copying data into QTEMP and then running SQL against this temporary copy.  There are certain aspects of this QTEMP approach that are beneficial, but there’s a real dark side, to use a Star Wars analogy, from a performance perspective that should be not ignored.   I’ve written this article to highlight the reasons why IBM i developers should consider ending their QTEMP habit.

The good news is that SQL has several different features that allow you to get the benefits of the QTEMP approach without the performance overhead.   If you or your team need any help with a New Year’s resolution in 2021 to break the QTEMP habit with advanced SQL, then let me know.

Have a Merry Christmas and wonderful holiday season, I look forward to talking with all of you in 2021! 

Friday, November 13, 2020

Db2 for i Comings & Goings

 As you might have guessed from the quietness of this blog, Mike Cain, my good friend and long-time Db2 partner in crime retired from IBM last year and is happily enjoying retirement.  

And if you haven’t heard yet, I returned to the IBM i world this Fall after a 5-year stint working on healthcare solutions powered by IBM Watson technologies. I’m excited to be back working work with the Db2 team in the IBM Lab Services Power Systems Delivery Practice where I get to help clients get the most out of Db2 and SQL on IBM i.  With my return, I’ll be using this blog to raise awareness about all things related to Db2 for i. 

Speaking of new things in Db2 land, today there are new IBM i technology refreshes available for the 7.3 and 7.4 releases that as always include numerous Db2 enhancements.  My favorite is the new IF EXISTS clause for DROP statements which can be used to greatly simplify your SQL database creation scripts – you can learn about how to use this feature in my article.  A complete list of all the Db2 and IBM i enhancements in the latest technology refresh can be found here.   

It's good to be back!

Friday, April 13, 2018

Data: The New Currency

In this month's issue of the IBM Systems Magazine - POWER edition, author Neil Tardy shares some observations and insights on the importance of realizing maximum value from data through collaboration.

If you have not already read the piece, you can check it out here.

I highly recommend you share it with your business leaders and executives, taking advantage of the opportunity to discuss and reevaluate your collection and use of data.

Please reach out to me if you need assistance with the conversation, and as you persevere to do more, and do better.