Monday, October 22, 2012

Take Advantage of More POWER

Along with the recent IBM i TR5 announcement we see that new POWER 7+ fueled systems will be available to drive IBM i.  What does this mean for DB2 for i?

Simply put, it means that we continue to have more powerful and sophisticated engines available for data centric processing. To fully exploit the new processors (get the most bang for the buck so to speak) we need to make use of all the CPUs available. Specifically, we need to employ parallelism.

While the processor clock rate (i.e. frequency as measured in gigahertz) has continually increased over time, the rate of increase has slowed down. No longer do we see big jumps in processor “speed” at every turn of the crank (even though IBM enjoys some of the fastest commercially available processors). Rather, we are seeing more processing power provided on each chip through an increase in the number of cores available.

In multi user systems like IBM i, as users all press enter at the same time, each job must wait its turn to get time on a CPU. If there is only one core available to handle the requests, the work stacks up.

The faster the CPU, the faster the work is completed for the current job, allowing the next job to get serviced. As mentioned earlier, processor frequencies are topping out near or at 5GHz. But, if more cores are available, then multiple jobs can be dispatched and serviced in the same unit of time. This has the positive effect of decreasing response times and increasing throughput.

This technique is a form of parallelism.  In other words, allowing more than one job to run concurrently. The ability to do “n-way processing” has been part of IBM i (OS/400 actually) since the early 90s. The “price” of allowing more than one job to run simultaneously is a corresponding increase in CPU utilization. More capability is delivered via multiple cores. In other words, you are trading (more) resources for (less) time. Furthermore, the n-way processing is inherent and automatic as part of the OS. You have the cores, you might as well use them.

Instead of n jobs in the queue waiting for processing resources, what happens if there is only   One   Big   Job ? Given that a job can be dispatched to only one core, the other available cores sit idle. In other words, the big job is a single unit of work, and there is no inherent mechanism or strategy to have it run in parallel across all the resources. How then to take advantage of the multiple cores available?

The Need for Parallel Techniques

To make use of multiple cores via parallel processing, something or someone must take the unit of work (aka the big job), and intelligently break it up into smaller units of work that can be run independently. The subunits of work can be dispatched to multiple cores and executed simultaneously.

Of course, this is easier said than done. The subunits of work must be somewhat balanced and capable of running together – not all work can be broken up and executed in parallel. Synchronous processes must be separated and accommodated differently from asynchronous processes.  The number of subunits employed must be matched to the amount of processing resources available. This is called the "degree of parallelism" by the way. And finally, the progress of each subunit of work must be supervised and their respective results consolidated at the end of execution.

The sophistication to do this at all, much less to do it well is the domain of serious computer scientists and software engineers. It is both science and art to be sure.

Good News – DB2 for i Does the Heavy Lifting!

If the big job is doing data centric work, there is a good chance that multiple cores can be used via database parallelism, AND DB2 for i will do the heavy lifting of determining both the type of parallelism and the degree of parallelism. This means for example that the big query doing a legitimate full table scan plus aggregation can be broken up into multiple threads whereby each thread is dispatched to a separate core (by IBM i). The result is multiple threads running in parallel at the same time. This is called symmetric multiprocessing or SMP for short.

DB2 for i Symmetric Multiprocessing is an optional feature of IBM i (option 26). To make use of database parallelism and get more out of multi core processor chips, DB2 SMP must be installed and enabled. In general, the use of SQL for accessing data is best practice for allowing clever parallel processing of queries. More information about DB2 SMP and the parallel processing capabilities can be found here.

As processor technology expands via more cores per chip, parallelism is the technique to take advantage of more power. To get effective and efficient parallelism, the work must be broken up and managed appropriately. The science and art to accomplish this can be daunting and time consuming. DB2 for i Symmetric Multiprocessing along with the latest POWER 7+ based systems is your ticket to getting more done in less time, with less effort and less risk.

No comments:

Post a Comment