There is a difference - a BIG difference - between “doing database” and doing “very large database”.
Tom McKinley, my longtime partner in this business of database regularly states, “bugs don’t scale” - meaning: what works just fine at a small size doesn’t necessarily work at a large size, and more than likely will NEVER work at a very large size.
It’s vitally important to first recognize and acknowledge that you are living in the realm of very large database (which is commonly abbreviated as “VLDB” by the way). Once recognized and acknowledged, you can begin to understand. That is, understand the critical success factors and best practices for maintaining and exploiting an IBM i VLDB environment.
I’ve been working in and around very large database since 1997, having helped to establish and lead the AS/400 Teraplex Center in Rochester Minnesota. During the five year reign of our
Teraplex, we had the privilege to privately own and use the largest AS/400
systems built to date. Embarrassingly huge things. We not only had one, but we had several maxed out
configurations. What can you do with a very large system at your disposal?
Well, our mission was to research, quantify and explain very large database
capabilities in the OS/400 environment. The driving force at the time was
primarily data warehousing and business intelligence, as well as the emerging
technology of parallelism. We were the first to demonstrate the positive
benefits of exploiting 20GB of memory, a new system maximum at the time. I
know, I know, 20GB doesn’t sound like much today but remember, WinTel database
servers were stuck at 4GB and hitting that limit required buying another server
for the farm (pause for effect and reflection). Of course today, an IBM Power system can have 16TB of
memory. Wow, have times changed.
Back in the day, we were the first to build and use a 2.15 billion row table, then a 4.3 billion row table. And the first to build and use a 512GB table, then a 1TB table and then a 1.7TB table. We were the first to build large indexes using DB2 Symmetrical Multiprocessing feature to take advantage of all 8, then 12, then 24 processors. Ahhh, the days when a single job could light up all the front panel lights while pushing 24 CPUs to the max in one big whoosh of power! I really miss the lights…
Lo and behold, through the infinite power of the internet, I managed to dig up an article from 1998 that announced the results from my team’s creation and use of the largest single DB2/400 table to date – an unimaginable 2.5 terabytes comprised of 10 billion rows in a single table. For a blast from the past, you can find and read the article here.
As you can see, the database capabilities of IBM i running on past and present Power systems are historic and vast – and unfortunately, little known (but you’re now coming up to speed, and spreading the word, right?).
What is VLDB, and Do I Have It?
The definition of very large database varies greatly, with new high water marks and attributes being established almost daily. This is especially true given the explosive expansion of data from all sorts of sources and events (see Big Data). But what about in the IBM i environment, what do we consider to be very large? Or better yet, when should you start to think and behave differently due to the size of a table or index?
Generally speaking, we start to seriously consider VLDB techniques and practices when a single table (or physical file) approaches 500GB in total size and/or the number of rows approaches or exceeds 1 billion (that’s 1 000 000 000 for my UK friends).
Another major consideration is the rate of data growth and the trend line illustrating when the table or physical file size limit will be reached. An event to watch out for is acquisition; when your company decides to branch out and/or acquire a large set of clients, customers or market share. If the current state of the database is unknown or unaccounted for, the acquisition and corresponding mass addition of data can push you into VLDB territory, or throw you up against a limit in one fell swoop.
Limit! What Limit?
Yes, Virginia, there is a limit. Even though DB2 for i enjoys some of the highest limits of any relational database system available, there are indeed maximums for a single data space (i.e. the container that physically holds the data). The two limits we refer to when talking about very large database are: size and number of rows.
Drum roll please... AND THE LIMITS ARE:
1.7 terabytes or 4.3 billion (4 294 967 288) rows for a table / physical file
1 terabyte for an index / keyed logical file
By the way, all of the DB2 for i limits can be found in the SQL Reference - Appendix A.
Hey… wait a minute. If the limit is 1.7TB and/or 4.3 billion rows, how did you create a table that was 2.5TB in size containing 10 billion rows?
Simple, my team not only knows the limits, but more importantly, knows how and when to overcome them.
And so should you!
What Not To Do
In the past 2 months, the DB2 for i Center of Excellence has been involved in no less than four critical situations where IBM i users unknowingly hit the table limit. Sorry, let me rephrase that. They knew they hit something, ‘cause their application stopped working. But they didn’t know what they hit. And frankly, THAT is the issue. These folks had no idea that a) there is a limit, b) they were really close to it, and c) when the limit is reached, the business stops. Have a nice day.
What To Do
If you think your tables are getting big, or you are just curious, ask the system to show the metadata (i.e. data about data). Try this:
The DB2 for i Health Center can show you a lot of things, but we are interested in the Size Limits for objects in a particular schema.
If you want to change the schema of interest, hit the button and do so:
When selecting the Size Limits tab, we get the opportunity to view the table and index current size compared to the maximum size or limit:
Take a look at both the table size and number of rows. If either of these are approaching the limit, you'll see a warning indicating a critical state. Please do be aware that if the schema has a lot of tables in it, this operation might take a few minutes.
If you are interested in where the index size is at, select this object type:
If you want a programmatic approach to reviewing your table and index sizes as compared to the limits, there is a stored procedure called QSYS2.Health_Size_Limits () and the description can be found here.
Strategically speaking, here is my list of IBM i Very Large Database critical success factors (aka “do or die”):
1: stand up an IBM i Database Engineer
2: get educated - obtain skills and expertise in all things DB2 for i
3: assess the current state of data, and identify the data growth and usage trends
4: build and implement a plan to handle large tables and indexes if approaching limits5: build and implement a plan to modernize and re-engineer database objects and data access
The IBM DB2 for i Center of Excellence is the foremost (and dare I say only) team in the world fully qualified to assist you with very large database in the IBM i environment. As a matter of fact, the table limits and VLDB capabilities are what they are today because of the past research and leadership of the AS/400 Teraplex Center and the expertise of the men and women in the DB2 for i Development Laboratory.
If you need help with understanding the current state of your growing databases, or with meeting requirements for large data and scalability, contact me. We have a DB2 for i Very Large Database Consulting Workshop available that can identify solutions for overcoming limits to growth and ensure you are getting the most out of IBM i. We can also do remote database and application assessments to help identify any looming issues sooner than later – that is to say, before your business hits a wall.
At the end of the day, rest assured that DB2 for i can handle your data whether you count rows in the millions or in the billions, assuming you embark on proper planning and implementation of the very large database practices and procedures for IBM i.