rdbmsadv: Goden Rules for DB performance

http://www.b-eye-network.com/blogs/linstedt/archives/2006/03/golden_rules_of.php

I've been applying performance and tuning techniques to systems world-wide for the past 15 years. I grew up as an assembly level programmer on a CPM-Z-80/Z-8080 Digital VT-180 computer, along with Unix System 5, and a few other machines. It used to be that many would say Performance and Tuning is an art more than a science, well, these days - the science part of it is what really makes this work. In this entry I'm going to introduce to you the top golden rules for performance and tuning your systems, and architectures - this is just a peek at what I'm going to be teaching at the TDWI conference in San Diego in August. In my assessments I cover everything from hardware, to architecture, to systems, to overloading, and platform sizing.

Have you ever wondered how to reduce ETL/ELT processing times from 72 hours to 23 hours? or 18 hours to 2 hours? Have you wondered how to gain 400% to 4000% performance improvements from the systems you have? How do you know if your RAM/CPU, Disk, IP/Network, Applications and RDBMS are in balanced and peak performance modes? Have you questioned when to buy new hardware, and what platform to move to?

The following golden rules are the top tips of performance across systems architecture - these are all part of a workshop course, and assessment that I offer on-site - which tailors the responses to your organization.

The top rules to performance and tuning any system are as follows:
1. Reduce the amount of data you're dealing with
2. Increase Parallelism

The rest of the rules are assigned to meet different categories and can include:
3. Balance load across the applications
4. Re-Architect the processes
5. Limit the RDBMS engines to their use of hardware
6. Partition, partition, partition
7. Manage restartability, fault-tolerance, fail-over
8. Classify Error categories
9. Do not overload hardware with too much parallelism
10. Tune disk access

There are about 250 such recommendations which go in to tuning any sort of system ranging from midrange to client/server based. Mainframes work slightly differently and require (sometimes) a lifting of the CU limitation put on the login. But let me talk about the first two rules: 1 & 2.

Decreasing the data set:
There are a multitude of ways in which to decrease the data set:

The first is identify which data you will actually be using during processing, and then ensures that only that data actually passes through the process. Sometimes this requires re-architecture in order to see the performance gains or to be able to reduce the data set.
The second is to partition the data, and apply the second rule - increase parallelism. Once partitioned, each partition within the parallel set of processes deals with "less data", therefore if the hardware can handle it, performance will increase.
Vertical and horizontal partitioning are two kinds of partitioning available: Vertical is split by number of "columns" or precision of the data set, horizontal is what we are used to with RDBMS table partitioning. These two are NOT mutually exclusive, unfortunately most RDBMS engines today do NOT do a good job of vertical partitioning.

Increasing Parallelism:

Remember this: DBA's often make the mistake of setting only "1" switch in the RDBMS engine to engage parallelism, then if the performance gain isn't seen, they change the switch back. This approach will NOT work. Most RDBMS engines require 10 to 16 switches be set just to engage parallelism the proper way, and allow the engine to rewrite queries, perform inserts and updates in parallel, and so on.
By simplifying (re-architecting) the processes, most processes can be created to run in parallel. Large complex processes are bound (in most cases) to serialize unless they are constructed to execute block style SQL. There are some RDBMS vendors that don't allow any other kind of processing because they execute everything in parallel for you.
WATCH YOUR PARALLELISM - too much of a good thing can overload your system, again, balance must be achieved. Watch your system resources - there are ways to baseline your system to gain the maximum performance for the minimum amount of changes. I can help you identify the quick changes to be made.
Remember: most RDBMS engines these days have parallel insert, update, and delete - but to take advantage of parallel updates and parallel deletes usually requires a script be executed within the RDBMS (as opposed to a direct connect). This is most of the RDBMS vendors don't offer PARALLELISM for updates/deletes in their API / SDK's for applications to use (this should change in the near future).

I/O's can kill performance, balancing I/O's and caching activity can be a huge performance gain (or loss if done improperly). One day when we have nanotech storage devices, the "disk" I/O will disappear. Until then, we must live with it.

I'd love to hear what you've done to tune your environments, if I use your story at TDWI I'll quote you as the source. Please let me know if you'd like to be quoted, feel free to drop me an email in private as well. This entry is just a glimpse into the P&T world.

rdbmsadv

Sunday, July 02, 2006

Goden Rules for DB performance

0 Comments:

About Me

Previous Posts