Along with the recent IBM i TR5 announcement we see that new
POWER 7+ fueled systems will be available to drive IBM i. What does this mean for DB2 for i?
Simply put, it means that we continue to have more powerful
and sophisticated engines available for data centric processing. To fully
exploit the new processors (get the most bang for the buck so to speak) we need
to make use of all the CPUs available. Specifically, we need to employ parallelism.
While
the processor clock rate (i.e. frequency as measured in gigahertz) has
continually increased over time, the rate of increase has slowed down. No longer do we see big jumps in processor “speed” at every turn
of the crank (even though IBM enjoys some of the fastest commercially available processors). Rather, we are seeing more processing power provided
on each chip through an increase in the
number of cores available.
In
multi user systems like IBM i, as users all press enter at the same time, each job must wait its
turn to get time on a CPU. If there is only one core available to handle
the requests, the work stacks up.
The
faster the CPU, the faster the work is completed for the current job, allowing the next
job to get serviced. As mentioned earlier, processor frequencies are topping
out near or at 5GHz. But, if more cores are available, then multiple jobs can
be dispatched and serviced in the same unit of time. This has the positive effect of decreasing response times and
increasing throughput.
This technique is a
form of parallelism. In other words, allowing more than one job to run concurrently. The ability
to do “n-way processing” has been part of IBM i (OS/400 actually) since the
early 90s. The “price” of allowing more than one job to run simultaneously is a
corresponding increase in CPU utilization. More capability is delivered via multiple cores. In other words, you are trading
(more) resources for (less) time. Furthermore, the n-way processing is inherent
and automatic as part of the OS. You have the cores, you might as well use them.
Instead
of n jobs in the queue waiting for processing resources, what happens if there
is only One Big Job ? Given that a job can be dispatched to only one core, the
other available cores sit idle. In other words, the big job is a single unit of
work, and there is no inherent mechanism or strategy to have it run in parallel
across all the resources. How then to take advantage of the multiple cores
available?
The Need for Parallel Techniques
To make use of multiple cores via parallel processing,
something or someone must take the unit of work (aka the big job), and intelligently
break it up into smaller units of work that can be run independently. The
subunits of work can be dispatched to multiple cores and executed
simultaneously.
Of course, this is easier said than done. The subunits of work must be
somewhat
balanced and capable of running together – not all work can be broken up
and executed
in parallel. Synchronous processes must be separated and accommodated
differently from asynchronous processes. The number of subunits
employed must be matched to the amount of
processing resources available. This is called the "degree of parallelism"
by the
way. And finally, the progress of each subunit of work must be
supervised and their
respective results consolidated at the end of execution.
The sophistication to do this at all, much less to
do it well is the domain of serious computer scientists and software engineers.
It is both science and art to be sure.
Good News – DB2 for i Does the Heavy Lifting!
If the big job is doing data centric work, there is a good
chance that multiple cores can be used via database parallelism, AND DB2 for i
will do the heavy lifting of determining both the type of parallelism and the
degree of parallelism. This means for example that the big query doing a legitimate full
table scan plus aggregation can be broken up into multiple threads whereby each
thread is dispatched to a separate core (by IBM i). The result is multiple threads
running in parallel at the same time. This is called symmetric multiprocessing or SMP for short.
DB2
for i Symmetric Multiprocessing is an optional feature of IBM i (option 26). To
make use of database parallelism and get more out of multi core processor
chips, DB2 SMP must be installed and enabled. In general, the use of SQL for accessing data is best practice for allowing clever parallel processing of queries. More information about DB2 SMP
and the parallel processing capabilities can be found here.
As
processor technology expands via more cores per chip, parallelism is the
technique to take advantage of more power. To get effective and efficient
parallelism, the work must be broken up and managed appropriately. The science
and art to accomplish this can be daunting and time consuming. DB2 for i
Symmetric Multiprocessing along with the latest POWER 7+ based systems is your
ticket to getting more done in less time, with less effort and less risk.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.