Monday, December 13, 2010

Data replication options - Part Two (Application-based replication)

At the top of the IT stack lies the application.  This is what the end-user interfaces with and what you're most likely to hear "is down" in the event of an outage.  Since the application is what you're actually trying to protect against a disaster it makes a great deal of sense to leverage any built-in options for data replication.

And luckily applications have added built-in replication as the importance of disaster preparedness increased.  A few brief examples:

Oracle offers Data Guard for their RDBMS.  While this solution could have been considered simply log shipping in the past, today they position it as a complete disaster recovery solution.

For the popular Exchange Server, Microsoft offers Cluster Continuous Replication (CCR), while for SQL Server there are options both for log shipping and transactional replication.  A more detailed discussion of the SQL Server options is available here.

If you define "application" broadly enough to include infrastructure services, then there are typically options there as well.  DNS, LDAP, and Active Directory are all architecturally designed so that they may be deployed in a fashion that is redundant across multiple sites - the key is to recognize the need for this redundancy, deploy appropriately, and test.

Given that the application is what we're trying to protect, then why doesn't everyone just rely on application-based replication?  Well, there are a couple of considerations:

First of all, most environments have multiple applications that they're trying to protect against a disaster.  In the same way that the real test of a backup is whether or not you can restore, the real test of a disaster recovery solution is whether or not you can recover.  When you use an application-based replication solution then when a disaster happens you have to have a person available at the remote site who knows enough about the application to perform the necessary steps to bring it online.  If you're only running one application (if you're a Software as a Service provide, for example) then that's great.  You have the necessary resources for that one application.  As you start putting more and more applications into the mix, though, the probability that you won't have the right resource available in the event of an emergency increases.

A second reason is that leveraging application-based recovery couples your disaster recovery solution with the support matrix of the application.  This means that as time progresses and the application move through its life cycle you have to include the replication in your considerations.

To summarize:

Pros:

  • Protects the environment at an easily understandable level.
  • Typically cost-effective.
  • No concerns around application support (as it is part of the application).
  • Often includes testing for logical corruption (which other approaches cannot).
Cons:
  • Increased complexity in environments with multiple applications, decreasing the probability of successful recovery of an entire environment in a disaster.
  • Couples replication to the application, meaning that application maintenance and upgrades must include testing and validation of replication.

Share/Save/Bookmark

Monday, December 6, 2010

Storage Performance Concepts Entry 4 - The Real World

In the previous three entries on this topic we discussed several key storage performance concepts.

The Physical Disk Capabilities. Fibre Channel and SAS drives can handle more IOPS than SATA drives, making them a good choice for applications that generate a lot of random IO. From a MBPS standpoint SATA isn’t quite as fast as Fibre Channel and SAS but the delta is much smaller, making SATA acceptable for workloads that mainly generate sequential IO.

The common RAID implementations and their impact on storage performance. RAID 10 provides the best performance for random workloads. RAID 5 and 6 provide good performance for sequential workloads in some cases RAID 5 may actually be faster than RAID 10 – although this isn’t the norm.

The workload, the mix of; random, sequential, reads and writes has a major impact on performance, with writes putting the biggest load on disk drives.

We also showed how this formula could be used to determine the number of array groups that would be needed to meet an applications IOPS based on the disk drives and RAID level you choose.

(TOTAL IOps × % READ) + ((TOTAL IOps × % WRITE) ×RAID Penalty)

We left off pointing out that while this information is valuable it leaves off some of the challenges we face when architecting solutions in the real world. Two factors we have not considered are capacity and cost. The majority of the time we start building our solution based on the capacity requirements.

For example, an organization might need 10TB of capacity to support a new application with a random workload consisting of 75% reads and 25% writes with a peak IOPS load of 2,500. The capacity will be added to an existing array that supports both SAS and SATA drives.

Since this is a random workload we will be recommending SAS drives but aren’t yet sure if this needs to be a RAID 10 or RAID 5 configuration. We could use RAID 6 but since we will be using 450GB drives and our array has multiple hot spares we think that RAID 5 will provide suitable protection for the data.

First we will look at the capacity requirements for each RAID level.

Capacity

RAID 5

RAID 10

RAID Group Size

8+1

4+4

Usable Capacity

3,600

1,800

Required RAID Groups

3

6

Total Capacity

10,800

10,800

Using 450GB LUNs we need twice as many RAID 10 groups as RAID 5 groups to meet the capacity requirements.

Now we will take a look at the cost. We are using the same size drives for each configuration so the cost per drive is constant but we need almost twice as many drives for the RAID 10 configuration. In addition the number of drives in the RAID 10 configuration will require additional drive trays to be added to the array. In our case we are assuming that the drives are $1,500 a piece and that each tray holds 15 drives at a cost of $10,000 per tray.

Cost

RAID 5

RAID 10

RAID Group Size

8+1

4+4

Required RAID Groups

3

6

Total Disks Required

27

48

Trays Required

2

4

Cost of disks

$40,500

$72,000

Cost or trays

$20,000

$40,000

Total Cost

$60,500

$112,000

As you would expect the cost of RAID 10 is almost twice as high as RAID 5. What may not be as obvious are the performance differences between the two configurations. In the past we focused on comparing a single RAID group of each type, keeping the number of drives constant. In this case two things have changed.

1. I’m using an 8+1 array group rather than a 7+1. 8+1 is the RAID 5 configuration recommended by the manufacturer because of the way it aligns with the caching mechanisms of the array. In addition an 8+1 provides sufficient availability and rebuild times while making better use of the raw space.

2. In this real world configuration I have twice as many RAID 10 groups and therefore a lot more disk, hence raw IOPS.

Using 185 IOPS per drive we find that the two configurations have the following characteristics.

Performance

RAID 5

RAID 10

IOPS Per Drive

185

185

# of Drives

27

48

Raw IOPS

4995

8880

We can now use our formula to determine if either solution will meet our requirement.

Performance

RAID 5

RAID 10

Required IOPS

2500

2500

Percent Read

75%

75%

Percent Write

25%

25%

RAID Penalty

4

2

Adjusted IOPS

4375

3125

In our example both RAID 5 and RAID 10 meet the performance requirement. Although the RAID 5 configuration is tighter, it has a reasonable amount of headroom. Given the major difference in cost it is probably reasonable to proceed with the RAID 5 configuration.

Looking at our results you may say “Well the RAID 5 configuration may work but wouldn’t the RAID 10 design be a lot faster?” Well, not necessarily. If the speed limit is 55 and you must drive the speed limit a Ford F150 and a Ferrari will both get you there in the same amount of time. This is the same with storage, just because one configuration could run faster doesn’t mean it will – you have to be able to drive higher IOPS from the host.

The area that we will explore in our next entry is cache. While cache improves performance in general it is particularly beneficial for parity based RAID configurations.


Share/Save/Bookmark

Wednesday, December 1, 2010

Data replication options - Part One (Overview)

As mentioned previously, I'll be going over different approaches to data replication between now and the end of the year.  In this post I'll outline the different points in the data path where replication can occur and then follow up with a post per point (say that fast three times!) outlining pros and cons.


Before touching on the different points it is important to define some terminology.


Using a broad brush, all replication solutions may be divided into either synchronous or asynchronous.  Synchronous replication means that the write must be received at the remote location before the local host receives acknowledgement.  Asynchronous replication means that the local host receives acknowledgement once the write is complete at the local site, and replication of the write is handled, well, asynchronously.


Two quick comments:

  1. Using asynchronous replication means that there will be some data loss in the event of a disaster.  How much depends on several factors including the replication technology in use, the bandwidth between the sites, the change rate of the application(s) in question, and so on and so forth.
  2. In practice, most people use some form of asynchronous replication.  The biggest reason?  Synchronous replication introduces latency into every write - it has to traverse the link between locations and then an acknowledgement has to be sent back.  If the sites are close enough and the budget will support it, then synchronous is possible - but it's expensive.
Replication solutions can be further divided into continuous or discontinuous.  Continuous means that for every write generated at the primary site the same write is performed at the remote site.  Synchronous replication is, by definition, continuous.  Asynchronous replication may be either continuous or discontinuous.  Snapshot-based replication is an example of a discontinuous approach.  Discontinuous replication can lead to significant cost savings in the bandwidth required between sites if the application in question writes to the same location repeatedly, since only data at the scheduled time is replicated and the intermediate writes can be disregarded for replication purposes.

One last bit of exposition - I'll refer to replication as being write-order consistent (or "crash" consistent) throughout these posts - this basically means that writes are applied in-order for the solution.  This is a requirement for the successful replication of databases and other applications.

With that out of the way - here are the five places on where replication can occur, going from the highest level to the lowest level.
  1. The Application - most enterprise applications in use today have some form of replication baked in (possibly at the cost of an additional licensing fee).  Oracle Data Guard, Exchange CCR, MicroSoft SQL log shipping - all have some method of getting the data from point A to point B in a usable form.
  2. The Host - from something as simple to a write-splitter to something as involved as replacing the native volume manager, there are more host-based options for data replication than you can shake a stick at.
  3. The Switch - Several years ago there was a movement to make all storage intelligence (including replication and virtualization) built into the switches. That vision didn't really pan out, but it is still possible to do replication from the switch.
  4. An Appliance - begrudgingly added against my initial biases there are a number of appliances on the market that can perform virtualization.  Some involve virtualization of the storage in question, and could arguably be classified as array-based, while still others involve the installation of write-splitters on the hosts and could be classified as host-based, but I digress.
  5. The Array - Once the domain of only monolithic storage arrays, today most midrange arrays offer data replication solutions that are sufficient for typical needs.
I believe that all replication solutions can be grouped into one of the five categories above.  Coming up next - Application-based replication.

Share/Save/Bookmark

Wednesday, November 24, 2010

Storage Performance Concepts Entry 3

So far we have covered the performance characteristics of the physical drives as well as the impact of various RAID levels. In this entry we will take a look at the next level in the stack – the workload.

As we mentioned in the previous entries, most applications do a mix of IO; reads, writes, sequential and random. We’ve also touched on IOPS versus MBPS. These concepts are related in that with random IO you are typically most interested in IOPS while with sequential IO you are normally interested in MBps. We pointed out that while SAS and FC drives can handle significantly more random IO than SATA, for sequential operations the MBps capabilities are comparable.

We can use this information when architecting solutions. Operations such as backup to disk are typically sequential in nature meaning that we are more concerned with MBPS and that SATA may be a good choice. OLTP applications perform primarily random IO with a mix of reads and writes, making SAS or Fibre Channel a better fit.

There are typically two scenarios we are looking at when designing storage solutions, either A) this is a new application not yet implemented, or B) this is an existing application that we are looking to move to a new storage platform.

When architecting the storage for a new application we may not have as much information to work with. Since we cannot collect any real world statistics we have to rely on the application provider to give us insight into the expected workload. This can be a challenge because in some cases they just don’t know. The workload characteristics are heavily dependent on the modules in use, the overall size and how your organization will actually use the product. This is in part why many software vendors and in particular database vendors recommend RAID 10. RAID 10 is the fastest for these workloads and therefore the safest recommendation. Regardless you will be best served by getting as much information as possible on the actual IO requirements in terms of IOPS, MBPS, reads versus writes and the mix of sequential versus random IO.

In situations where the application is already in place and you are looking to move to a new storage platform the challenge is a bit easier since you have real data to look at. There are two ways to collect the workload information we will need.

If the existing storage array has the capabilities, you can collect the information directly from the array. If you use this approach make sure you pay attention to where the data is being collected from and what it is telling you. For example if you collect the physical IOPS from the backend disk drives or RAID groups, the numbers you are looking may include the impact of the RAID configuration - the actual data being written as well as parity or mirroring operations. In this scenario you may also miss IOs that are being handled in cache. Ideally what you would like to collect are the peak IOPS per logical device (referred to as LUN, LDEV or Volume depending on the nomenclature used by the array manufacturer) broken down by reads versus writes, sequential and random.

The same is true for MBps if that is what your application calls for. If the storage device provides this capability and has a decent interface this might be the easiest way to go. You can collect the information in one place and filter or sort the data to get exactly what you are looking for.

The second option is to collect the data from the application hosts themselves. Being a storage guy I typically don’t like to use stats collected from within the application. The primary reason is that I’m not always as familiar with them and therefore less confident about what they are telling me. Are these disk IOs or are some of the IOs coming from memory or the network? For this reason I typically recommend collecting the information using the native Operating System utilities such as performance monitor on Windows or sar or iostat for Unix. Make sure you understand the correct Syntax for your flavor of Unix.

The following link provides a basic overview and some examples of how to use these utilities.

IO Collection Utilities

One more thought on collecting performance data. It is very helpful to understand the access patterns from an application point of view, when are the busiest times, when do operations such as backups or data loads occur. You want to make sure you are capturing peak loads and understand any anomalies.

At this point you understand the performance characteristics of the disk drives themselves, the impact of RAID levels and the IO patterns associated with your workload. The next step is to combine this information to determine which approach will best meet your needs.

The simplest formula I have found for doing this is.

(TOTAL IOps × % READ) + ((TOTAL IOps × % WRITE) ×RAID Penalty)

So for example if your application generates 1000 random IOPS and a mix 70% reads and 30% writes a RAID 5 solution would work as follows.

(1000 x .70) + ((1000 x .30) x 4) = 700 + 1200 = 1900 IOPS

Using 15K SAS drives each capable of 180 Random IOPS in a 7+1 configuration =1,225 IOPS. Based on this I would need two 7+1 RAID 5 groups.

Let’s look at the same example using RAID 10

(1000 x .70) + ((1000 x .30) x 2) = 700 + 600 = 1200 IOPS

In this scenario, using the same drives types I would need one 4+4 RAID 10 Group.

A full overview of this formula and some real world examples can be found at the following link.

Workload Calculation Formula

So there you have it, the backend disk performance, the impact of the RAID configuration and workload considerations. So what does that leave? Well, in order to properly design a storage solution we have to understand these concepts but in a lot of ways that is what they are, concepts. What these concepts do not take into account are cache and cash! Cache has a significant impact on the performance you experience and cash impacts what you will actually be able to purchase. While working with clients we often start by looking at a solution based purely on RAID 10, and then we see the price and all have a good laugh before getting to work on the real solution. Applying what we have learned so far to real world designs is the focus of our next entry.


Share/Save/Bookmark