There is a difference - a BIG difference - between “doing database” and doing “very large database”.
Tom McKinley, my longtime partner in this business of
database regularly states, “bugs don’t scale” - meaning: what works just fine at
a small size doesn’t necessarily work at a large size, and more than likely
will NEVER work at a very large size.
It’s
vitally important to first recognize and acknowledge that you are living in the
realm of very large database (which is commonly abbreviated as “VLDB” by the
way). Once recognized and acknowledged, you can begin to understand. That is, understand
the critical success factors and best practices for maintaining and exploiting
an IBM i VLDB environment.
Some History
I’ve
been working in and around very large database since 1997, having helped to
establish and lead the AS/400 Teraplex Center in Rochester Minnesota. During the five year reign of our terror
Teraplex, we had the privilege to privately own and use the largest AS/400
systems built to date. Embarrassingly huge things. We not only had one, but we had several maxed out
configurations. What can you do with a very large system at your disposal?
Well, our mission was to research, quantify and explain very large database
capabilities in the OS/400 environment. The driving force at the time was
primarily data warehousing and business intelligence, as well as the emerging
technology of parallelism. We were the first to demonstrate the positive
benefits of exploiting 20GB of memory, a new system maximum at the time. I
know, I know, 20GB doesn’t sound like much today but remember, WinTel database
servers were stuck at 4GB and hitting that limit required buying another server
for the farm (pause for effect and reflection). Of course today, an IBM Power system can have 16TB of
memory. Wow, have times changed.
Back in the day, we were the first to build and use a 2.15 billion row table,
then a 4.3 billion row table. And the first to build and use a 512GB table,
then a 1TB table and then a 1.7TB table. We were the first to build large
indexes using DB2 Symmetrical Multiprocessing feature to take advantage of all
8, then 12, then 24 processors. Ahhh, the days when a single job could light up
all the front panel lights while pushing 24 CPUs to the max in one big whoosh
of power! I really miss the lights…
Lo
and behold, through the infinite power of the internet, I managed to dig up an article from 1998 that announced the results
from my team’s creation and use of the largest single DB2/400 table to date –
an unimaginable 2.5 terabytes comprised of 10 billion rows in a single table. For a blast from
the past, you can find and read the article here.
As
you can see, the database capabilities of IBM i running on past and present
Power systems are historic and vast – and unfortunately, little known (but you’re
now coming up to speed, and spreading the word, right?).
What is VLDB, and Do I Have It?
The definition of very large database varies greatly, with
new high water marks and attributes being established almost daily. This is
especially true given the explosive expansion of data from all sorts of sources
and events (see Big Data).
But what about in the IBM i environment, what do we consider to be very large?
Or better yet, when should you start to think and behave differently due to the
size of a table or index?
Generally
speaking, we start to seriously consider VLDB techniques and practices when a single table (or
physical file) approaches 500GB in total size and/or the number of rows
approaches or exceeds 1 billion (that’s 1 000 000 000 for my UK friends).
Another
major consideration is the rate of data growth and the trend line illustrating
when the table or physical file size limit will be reached. An event to watch out for is
acquisition; when your company decides to branch out and/or acquire a large set
of clients, customers or market share. If the current state of the database is
unknown or unaccounted for, the acquisition and corresponding mass addition of
data can push you into VLDB territory, or throw you up against a limit in one
fell swoop.
Limit! What Limit?
Yes, Virginia, there is a limit. Even
though DB2 for i enjoys some of the highest limits of any relational database
system available, there are indeed maximums for a single data space (i.e. the container
that physically holds the data). The two limits we refer to when talking about
very large database are: size and number of rows.
Drum roll please... AND THE LIMITS ARE:
1.7 terabytes or 4.3 billion (4
294 967 288) rows for a table / physical file
1 terabyte for an index / keyed logical file
By the way, all of the DB2 for i limits can be found in the SQL Reference - Appendix A.
Hey… wait a minute. If the limit is 1.7TB and/or 4.3 billion
rows, how did you create a table that was 2.5TB in size containing 10 billion
rows?
Simple, my team not only knows the limits, but more
importantly, knows how and when to overcome them.
And so should you!
What Not To Do
In
the past 2 months, the DB2 for i Center of Excellence has been involved in no
less than four critical situations where IBM i users unknowingly hit the
table limit. Sorry, let me rephrase that. They knew they hit something, ‘cause
their application stopped working. But they didn’t know what they hit. And frankly, THAT is the issue. These folks
had no idea that a) there is a limit, b) they were really close to it, and c)
when the limit is reached, the business stops. Have a nice day.
What To Do
If you think your tables are getting big, or you are just
curious, ask the system to show the metadata (i.e. data about data). Try this:
The DB2 for i Health Center can show you a lot of things, but we are interested in the Size Limits for objects in a particular schema.
If you want to change the schema of interest, hit the button and do so:
When selecting the Size Limits tab, we get the opportunity to view the table and index current size compared to the maximum size or limit:
Take a look at both the table size and number of rows. If either of these are approaching the limit, you'll see a warning indicating a critical state. Please do be aware that if the schema has a lot of tables in it, this operation might take a few minutes.
If you are interested in where the index size is at, select this object type:
If you want a programmatic approach to reviewing your table and index sizes as compared to the limits, there is a stored procedure called QSYS2.Health_Size_Limits () and the description can be found here.
Do More
Strategically
speaking, here is my list of IBM i Very Large Database critical success factors
(aka “do or die”):
1: stand up an IBM i Database Engineer
2: get educated - obtain skills and expertise in all things
DB2 for i
3: assess the current state of data, and identify the data growth
and usage trends
4: build and implement a plan to handle large tables and
indexes if approaching limits
5: build and implement a plan to modernize and re-engineer
database objects and data access _____
The IBM DB2 for i Center of Excellence is the foremost (and dare I say only) team in the world fully qualified to assist you with very large database in the IBM i environment. As a matter of fact, the table limits and VLDB capabilities are what they are today because of the past research and leadership of the AS/400 Teraplex Center and the expertise of the men and women in the DB2 for i Development Laboratory.
If you need help with understanding the current state of your growing databases, or with meeting requirements for large data and scalability, contact me. We have a DB2 for i Very Large Database Consulting Workshop available that can identify solutions for overcoming limits to growth and ensure you are getting the most out of IBM i. We can also do remote database and application assessments to help identify any looming issues sooner than later – that is to say, before your business hits a wall.
At the end of the day, rest assured that DB2 for i can handle your data whether you count rows in the millions or in the billions, assuming you embark on proper planning and implementation of the very large database practices and procedures for IBM i.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.