Monday, August 26, 2013

DB2 for i Hitting the Mark – Again


I’m excited to have Rob and Susan Bestgen share some wonderful news, and more importantly from my perspective, a genuine proof point that illuminates the powerful capabilities of using DB2 for i for data centric processing.

As long time readers know, Rob is the chief architect of the DB2 for i query optimizer. His wife Susan is a member of the IBM i development team that looks after SAP (running on IBM i). Her work includes interpreting and running the benchmarks that SAP provides to solution infrastructure vendors such as IBM. And it just so happens that both Rob and Susan celebrated their 25th anniversary of service with IBM early this summer. Congratulations!

Susan and Rob, the blog is yours…


______ 


Thanks Mike, we’re very happy to share some fantastic news...

On August 8, 2013, SAP published the latest vendor certification for their Enhanced Mixed Load benchmark (BW-EML), the current spec in its evolving suite of business intelligence benchmarks.

Which vendor has achieved the latest #1 benchmark position?

IBM

More importantly, which IBM platform hit the high mark?

IBM i running on Power!

If you would like to see the benchmark results, look here.

So, IBM i once again shows its stripes via the industrial strength operating system and DB2 for i database management system.

Maybe you’re wondering how the IBM i benchmark team pulled it off? 

First and foremost, the benchmark was purely SAP’s specifications, and SAP certified the results.

Did IBM i use some new columnar database enhancement?

Did IBM i use some hidden options or special code not publicly available?

Of course not; there was no special code, no hidden options.

OK, but the IBM benchmark team must have used something that regular customers don’t have or could not practically use themselves.

Nope, the technology used in to run the benchmark was off the shelf - available to anyone.

Actually, what made the benchmark a success was utilization of the tremendously powerful, patented technologies that have existed in DB2 for i, IBM i and Power Systems for years.


Digging Deeper


The benchmark used a combination of software and hardware to create an environment well suited for business intelligence (BI). The benchmark ran in a single partition on a 32 core, Power System P7+ Model 750 configured with 512GB of memory. The Simultaneous Multi Threading (SMT) feature of Power provided the ability to run multiple threads per core simultaneously. For storage, a combination of HDDs and SSDs were used to achieve good performance at a reasonable price. The SSDs were utilized to store the ‘hottest’ data while the spinning HDDs stored the rest.

It is typical for a database to have a large percentage of data which is infrequently used (‘cold’) and a small percentage of data which is frequently used (‘hot’). SSDs offer the best performance when focused on hot data, while HDDs offer the best storage cost – ideal for data which is less frequently accessed.





In the benchmark scenario, high use database tables and indexes were tagged with a UNIT preference of SSD to improve I/O, which in turn gave optimal response time and maximized throughput comparable to in-memory solutions. Hot data was identified by a combination of application knowledge and by using IBM i provided tools such as the SQL Plan Cache viewer and the CL command TRCASPBAL. All tools used are readily available for customers to utilize in their own environments.

The IBM i OS provided the overall management of the system. In particular, its Single Level Storage (SLS) component provided the proper memory and disk management to effectively eliminate any need for buffer pools or storage pool management. In regards to the earlier discussion on ‘hot’ data, SLS also provides the capability to set up automatic tracing and balancing of data onto and off of the SSDs using the TRCASPBAL and STRASPBAL CL commands. Both the tagged object approach utilized in the benchmark and this trace and balance SLS option are valid techniques for good data placement. Which approach to use can vary from customer to customer, depending on the customer’s knowledge of their database model and the actual data profile.

Speaking of the database management system,  DB2 for i was really the star of the show when it came to the necessary capabilities to achieve the benchmark success. SAP designed the EML database as a Star Schema model (actually a snowflake), an industry standard approach for most data warehouses and data marts. By using best practices on data-centric design, the benchmark team fully utilized the best of query optimization and database engine abilities. 

Encoded Vector Indexes (EVIs) were a major contributor to the benchmark success. An EVI is advanced indexing technology that provides the performance advantages of a columnar database without affecting the underlying table itself, thereby avoiding the negative aspects of a columnar database. Simple (single column) EVIs were used to avoid a specialized, overly complex indexing environment. 

Of course, indexes are only useful in a star schema environment if the database provides optimization and run time technology that leverages them. DB2 for i accomplished this wonderfully using its patented Look-ahead PredicateGeneration (LPG) technology. LPG is a query optimizer rewrite technique that generates local selection predicates from join criteria to minimize processing on large database (fact) tables. LPG is basically ‘Star Schema Technology’ on steroids. In addition, DB2 for i utilized index ANDing and ORing ability to combine multiple indexes, as necessary, to satisfy multiple selection criteria in a single query. The best part was that DB2 for i did all of this automatically once the EVIs were created. No hints, and no fiddling with the query plan.

Materialized Query Tables (MQTs), also known as user defined summary tables, were employed to pre-compute some of the aggregate results. MQTs, once enabled, can be used by the query optimizer to implement all or part of a query. Due to the randomness of the queries in the EML benchmark, and in keeping with the benchmark team’s commitment to minimize any specialized processing, the MQT definitions used were kept generic to limit environment complexity.

Frequently in BI environments, Symmetric MultiProcessing (SMP) is used to improve query performance. Due to the already high number of query users, SMP was not beneficial for the ad hoc reporting portion of the benchmark. However, for the real-time updates required by the benchmark specs, SMP was utilized to improve data load times without compromising query performance.

To round things out, the ability for the cost-based query optimizer to make the proper plan choice came from the built in, automatic statistics kept by the database.

An IBM whitepaper provides more details on the benchmark itself.

The best part of it all?

The DB2 for i technologies utilized in the benchmark have been in the database for several years. No unproven, risky technologies. No ‘bleeding edge’ surprises to fight through. No ‘benchmark specials’. Just years of customer proven, industry hardened technology. This is benchmark technology ready for customers. Now that’s a refreshing change.

Since you’re reading this blog, you already know the Center of Excellence team is ready to consult and assist on any DB2 for i -or- SAP on i project. Helping customers fully utilize the technology used in the benchmark is just one item in the team’s list of capabilities.


[mc: if you haven't figured it out, you likely have most, if not all of this technology today]

No comments:

Post a Comment