Storage Meat: November 2010

Wednesday, November 24, 2010

Storage Performance Concepts Entry 3

So far we have covered the performance characteristics of the physical drives as well as the impact of various RAID levels. In this entry we will take a look at the next level in the stack – the workload.

As we mentioned in the previous entries, most applications do a mix of IO; reads, writes, sequential and random. We’ve also touched on IOPS versus MBPS. These concepts are related in that with random IO you are typically most interested in IOPS while with sequential IO you are normally interested in MBps. We pointed out that while SAS and FC drives can handle significantly more random IO than SATA, for sequential operations the MBps capabilities are comparable.

We can use this information when architecting solutions. Operations such as backup to disk are typically sequential in nature meaning that we are more concerned with MBPS and that SATA may be a good choice. OLTP applications perform primarily random IO with a mix of reads and writes, making SAS or Fibre Channel a better fit.

There are typically two scenarios we are looking at when designing storage solutions, either A) this is a new application not yet implemented, or B) this is an existing application that we are looking to move to a new storage platform.

When architecting the storage for a new application we may not have as much information to work with. Since we cannot collect any real world statistics we have to rely on the application provider to give us insight into the expected workload. This can be a challenge because in some cases they just don’t know. The workload characteristics are heavily dependent on the modules in use, the overall size and how your organization will actually use the product. This is in part why many software vendors and in particular database vendors recommend RAID 10. RAID 10 is the fastest for these workloads and therefore the safest recommendation. Regardless you will be best served by getting as much information as possible on the actual IO requirements in terms of IOPS, MBPS, reads versus writes and the mix of sequential versus random IO.

In situations where the application is already in place and you are looking to move to a new storage platform the challenge is a bit easier since you have real data to look at. There are two ways to collect the workload information we will need.

If the existing storage array has the capabilities, you can collect the information directly from the array. If you use this approach make sure you pay attention to where the data is being collected from and what it is telling you. For example if you collect the physical IOPS from the backend disk drives or RAID groups, the numbers you are looking may include the impact of the RAID configuration - the actual data being written as well as parity or mirroring operations. In this scenario you may also miss IOs that are being handled in cache. Ideally what you would like to collect are the peak IOPS per logical device (referred to as LUN, LDEV or Volume depending on the nomenclature used by the array manufacturer) broken down by reads versus writes, sequential and random.

The same is true for MBps if that is what your application calls for. If the storage device provides this capability and has a decent interface this might be the easiest way to go. You can collect the information in one place and filter or sort the data to get exactly what you are looking for.

The second option is to collect the data from the application hosts themselves. Being a storage guy I typically don’t like to use stats collected from within the application. The primary reason is that I’m not always as familiar with them and therefore less confident about what they are telling me. Are these disk IOs or are some of the IOs coming from memory or the network? For this reason I typically recommend collecting the information using the native Operating System utilities such as performance monitor on Windows or sar or iostat for Unix. Make sure you understand the correct Syntax for your flavor of Unix.

The following link provides a basic overview and some examples of how to use these utilities.

IO Collection Utilities

One more thought on collecting performance data. It is very helpful to understand the access patterns from an application point of view, when are the busiest times, when do operations such as backups or data loads occur. You want to make sure you are capturing peak loads and understand any anomalies.

At this point you understand the performance characteristics of the disk drives themselves, the impact of RAID levels and the IO patterns associated with your workload. The next step is to combine this information to determine which approach will best meet your needs.

The simplest formula I have found for doing this is.

(TOTAL IOps × % READ) + ((TOTAL IOps × % WRITE) ×RAID Penalty)

So for example if your application generates 1000 random IOPS and a mix 70% reads and 30% writes a RAID 5 solution would work as follows.

(1000 x .70) + ((1000 x .30) x 4) = 700 + 1200 = 1900 IOPS

Using 15K SAS drives each capable of 180 Random IOPS in a 7+1 configuration =1,225 IOPS. Based on this I would need two 7+1 RAID 5 groups.

Let’s look at the same example using RAID 10

(1000 x .70) + ((1000 x .30) x 2) = 700 + 600 = 1200 IOPS

In this scenario, using the same drives types I would need one 4+4 RAID 10 Group.

A full overview of this formula and some real world examples can be found at the following link.

Workload Calculation Formula

So there you have it, the backend disk performance, the impact of the RAID configuration and workload considerations. So what does that leave? Well, in order to properly design a storage solution we have to understand these concepts but in a lot of ways that is what they are, concepts. What these concepts do not take into account are cache and cash! Cache has a significant impact on the performance you experience and cash impacts what you will actually be able to purchase. While working with clients we often start by looking at a solution based purely on RAID 10, and then we see the price and all have a good laugh before getting to work on the real solution. Applying what we have learned so far to real world designs is the focus of our next entry.

Of strategy, tools, and trade-offs

I'm going to take some time over the remainder of this year to talk about different solutions for data replication. It's a complicated discussion, and I'll be limiting the scope to only cover solutions that are write-order consistent (sorry rsync fans).

While collecting my thoughts, though, I thought it made sense to talk a little philosophy.

First, I want to touch on strategy. A few years ago I was called into a brainstorming meeting with a customer to discuss their IT strategy and make suggestions. It was a very brief meeting, as their strategy was "We use vendor X." Confused, I tried to clarify what, exactly, that meant. "You mean you use them to fulfill a piece of your strategy?" I asked. "No, whatever X tells us to do, we do." Six months later, X was bought by Y, and not too many years later Y was bought by Z. The customer, meanwhile, was left high and dry.

Strategy does not equal products. Strategy does not equal vendors. A strategy is a plan. A plan identifies direction not details. Customers (should) have a strategy. Vendors (should) have a strategy. Sometimes the two align and there can be a mutually beneficial relationship. If a customer cedes control of their strategy to a vendor, though, then they are basically saying that IT does not matter to their business. While that may be the case it should be considered and made as a conscious decision.

Second, it's important to realize for all the passion around specific products and approaches they are ultimately tools. Products are neither good nor bad - they are either fit for a purpose or unfit. Just because it's made by a specific vendor doesn't make it bad, especially in the acquisition-crazy world we're living in. Just because it's not the approach you're used to doesn't make it bad, either. The real question is does it do the job I need done, at a reasonable cost (both hard and soft dollars) without forcing me to fundamentally alter the way I do business today.

And that ties into the third point, trade-offs. There is no one perfect solution for every problem - and thank God because that would be a boring world. Sometimes you're willing to trade an hour's worth of data loss in the event of a disaster for the (substantial) cost savings. Sometimes you're willing to single-path development hosts. The key is to understand what trade-offs are available, and again make a conscious decision about what is right for your environment.

With that as the groundwork, I'll continue next week with the common approaches to data replication.

Monday, November 22, 2010

Storage Performance Concepts Entry 2

Our first entry on storage performance focused on IOPS and MBps from a physical disk drive perspective. This entry will take a closer look at performance as it relates to various RAID levels. Since this entry is intended to be practical we are not going to focus on all of the RAID levels or proprietary approaches, but rather the most common options you might be considering as you design a storage solution. In general these are RAID 10, RAID 5 and RAID 6. As a recap, here is how these levels are defined as per the SNIA Dictionary.

RAID 5

A form of parity RAID in which the disks operate independently, the data stripe size is no smaller than the exported block size, and parity check data is distributed across the RAID array's disks.

RAID 6

Any form of RAID that can continue to execute read and write requests to all of a RAID array's virtual disks in the presence of any two concurrent disk failures. Several methods, including dual check data computations (parity and Reed Solomon), orthogonal dual parity check data and diagonal parity have been used to implement RAID Level 6.

RAID 10

RAID is referred to as a nested or hybrid approach and is therefore excluded from the SNIA Dictionary. Technically RAID 10 is just a combination of RAID 1 and RAID 0 – mirroring and then striping. Years ago I would hear debates about which arrays did RAID 10 versus those that did RAID 0+1 but I haven’t heard any grumblings in a while so we will stick with RAID 10 to keep it simple.

Putting disk drives into RAID groups does not fundamentally change how many IOPS they can handle, the raw IOPS capabilities of a drive is constant. The reason we experience differences in performance have to do with how the various RAID levels handle IO and in particular any additional IOs generated to protect the data.

· A random write operation to a RAID 10 configuration results in 2 disk IOs, one to each drive in the mirror. The same single write request to a RAID 5 set would generate 4 IOs; read the data, read the parity, write the data, write the parity. RAID 6 adds two additional parity operations to each write.

· A sequential write IO to RAID 10 works the same as a random write, one to each disk in the mirror. For RAID 5 and 6 sequential operations are a bit more efficient than random IO since there is no existing data or parity to be read; write the data, write the parity.

· Read Operations are a bit different since neither parity nor mirroring come into play, it’s simply a matter of how many drives can you read data from concurrently. Assuming that each RAID configuration has the same number of drives, RAID 10 is slightly faster than parity RAID.

Here is a chart that makes this a little clearer.

Note: The chart is based on sending 4 IOs to the RAID group. I could have used a single IO as the base for the comparison but this would not demonstrate the differences in sequential and random IO for the parity based RAID configurations.

	RAID 10	RAID 5	RAID 6
Random Read	4	4	4
Random Write	8	16	24
Sequential Read	4	4	4
Sequential Write	8	5	6

As you can see the differences are pretty significant. By combining the information in our first entry about the performance capabilities of the various drive types with the characteristics of the most common RAID levels you can get an accurate picture of the overall backend performance that the system is capable of. Of course in practice workloads don’t typically fall neatly into these categories, but rather perform a combination of different IO types.

Understanding the workloads and how they impact performance is the subject of our next entry.