So far we have covered the performance characteristics of the physical drives as well as the impact of various RAID levels. In this entry we will take a look at the next level in the stack – the workload.
As we mentioned in the previous entries, most applications do a mix of IO; reads, writes, sequential and random. We’ve also touched on IOPS versus MBPS. These concepts are related in that with random IO you are typically most interested in IOPS while with sequential IO you are normally interested in MBps. We pointed out that while SAS and FC drives can handle significantly more random IO than SATA, for sequential operations the MBps capabilities are comparable.
We can use this information when architecting solutions. Operations such as backup to disk are typically sequential in nature meaning that we are more concerned with MBPS and that SATA may be a good choice. OLTP applications perform primarily random IO with a mix of reads and writes, making SAS or Fibre Channel a better fit.
There are typically two scenarios we are looking at when designing storage solutions, either A) this is a new application not yet implemented, or B) this is an existing application that we are looking to move to a new storage platform.
When architecting the storage for a new application we may not have as much information to work with. Since we cannot collect any real world statistics we have to rely on the application provider to give us insight into the expected workload. This can be a challenge because in some cases they just don’t know. The workload characteristics are heavily dependent on the modules in use, the overall size and how your organization will actually use the product. This is in part why many software vendors and in particular database vendors recommend RAID 10. RAID 10 is the fastest for these workloads and therefore the safest recommendation. Regardless you will be best served by getting as much information as possible on the actual IO requirements in terms of IOPS, MBPS, reads versus writes and the mix of sequential versus random IO.
In situations where the application is already in place and you are looking to move to a new storage platform the challenge is a bit easier since you have real data to look at. There are two ways to collect the workload information we will need.
If the existing storage array has the capabilities, you can collect the information directly from the array. If you use this approach make sure you pay attention to where the data is being collected from and what it is telling you. For example if you collect the physical IOPS from the backend disk drives or RAID groups, the numbers you are looking may include the impact of the RAID configuration - the actual data being written as well as parity or mirroring operations. In this scenario you may also miss IOs that are being handled in cache. Ideally what you would like to collect are the peak IOPS per logical device (referred to as LUN, LDEV or Volume depending on the nomenclature used by the array manufacturer) broken down by reads versus writes, sequential and random.
The same is true for MBps if that is what your application calls for. If the storage device provides this capability and has a decent interface this might be the easiest way to go. You can collect the information in one place and filter or sort the data to get exactly what you are looking for.
The second option is to collect the data from the application hosts themselves. Being a storage guy I typically don’t like to use stats collected from within the application. The primary reason is that I’m not always as familiar with them and therefore less confident about what they are telling me. Are these disk IOs or are some of the IOs coming from memory or the network? For this reason I typically recommend collecting the information using the native Operating System utilities such as performance monitor on Windows or sar or iostat for Unix. Make sure you understand the correct Syntax for your flavor of Unix.
The following link provides a basic overview and some examples of how to use these utilities.
IO Collection Utilities
One more thought on collecting performance data. It is very helpful to understand the access patterns from an application point of view, when are the busiest times, when do operations such as backups or data loads occur. You want to make sure you are capturing peak loads and understand any anomalies.
At this point you understand the performance characteristics of the disk drives themselves, the impact of RAID levels and the IO patterns associated with your workload. The next step is to combine this information to determine which approach will best meet your needs.
The simplest formula I have found for doing this is.
(TOTAL IOps × % READ) + ((TOTAL IOps × % WRITE) ×RAID Penalty)
So for example if your application generates 1000 random IOPS and a mix 70% reads and 30% writes a RAID 5 solution would work as follows.
(1000 x .70) + ((1000 x .30) x 4) = 700 + 1200 = 1900 IOPS
Using 15K SAS drives each capable of 180 Random IOPS in a 7+1 configuration =1,225 IOPS. Based on this I would need two 7+1 RAID 5 groups.
Let’s look at the same example using RAID 10
(1000 x .70) + ((1000 x .30) x 2) = 700 + 600 = 1200 IOPS
In this scenario, using the same drives types I would need one 4+4 RAID 10 Group.
A full overview of this formula and some real world examples can be found at the following link.
Workload Calculation Formula
So there you have it, the backend disk performance, the impact of the RAID configuration and workload considerations. So what does that leave? Well, in order to properly design a storage solution we have to understand these concepts but in a lot of ways that is what they are, concepts. What these concepts do not take into account are cache and cash! Cache has a significant impact on the performance you experience and cash impacts what you will actually be able to purchase. While working with clients we often start by looking at a solution based purely on RAID 10, and then we see the price and all have a good laugh before getting to work on the real solution. Applying what we have learned so far to real world designs is the focus of our next entry.
Storage Performance Concepts Entry 3