Wednesday, December 1, 2010

Data replication options - Part One (Overview)

As mentioned previously, I'll be going over different approaches to data replication between now and the end of the year.  In this post I'll outline the different points in the data path where replication can occur and then follow up with a post per point (say that fast three times!) outlining pros and cons.

Before touching on the different points it is important to define some terminology.

Using a broad brush, all replication solutions may be divided into either synchronous or asynchronous.  Synchronous replication means that the write must be received at the remote location before the local host receives acknowledgement.  Asynchronous replication means that the local host receives acknowledgement once the write is complete at the local site, and replication of the write is handled, well, asynchronously.

Two quick comments:

  1. Using asynchronous replication means that there will be some data loss in the event of a disaster.  How much depends on several factors including the replication technology in use, the bandwidth between the sites, the change rate of the application(s) in question, and so on and so forth.
  2. In practice, most people use some form of asynchronous replication.  The biggest reason?  Synchronous replication introduces latency into every write - it has to traverse the link between locations and then an acknowledgement has to be sent back.  If the sites are close enough and the budget will support it, then synchronous is possible - but it's expensive.
Replication solutions can be further divided into continuous or discontinuous.  Continuous means that for every write generated at the primary site the same write is performed at the remote site.  Synchronous replication is, by definition, continuous.  Asynchronous replication may be either continuous or discontinuous.  Snapshot-based replication is an example of a discontinuous approach.  Discontinuous replication can lead to significant cost savings in the bandwidth required between sites if the application in question writes to the same location repeatedly, since only data at the scheduled time is replicated and the intermediate writes can be disregarded for replication purposes.

One last bit of exposition - I'll refer to replication as being write-order consistent (or "crash" consistent) throughout these posts - this basically means that writes are applied in-order for the solution.  This is a requirement for the successful replication of databases and other applications.

With that out of the way - here are the five places on where replication can occur, going from the highest level to the lowest level.
  1. The Application - most enterprise applications in use today have some form of replication baked in (possibly at the cost of an additional licensing fee).  Oracle Data Guard, Exchange CCR, MicroSoft SQL log shipping - all have some method of getting the data from point A to point B in a usable form.
  2. The Host - from something as simple to a write-splitter to something as involved as replacing the native volume manager, there are more host-based options for data replication than you can shake a stick at.
  3. The Switch - Several years ago there was a movement to make all storage intelligence (including replication and virtualization) built into the switches. That vision didn't really pan out, but it is still possible to do replication from the switch.
  4. An Appliance - begrudgingly added against my initial biases there are a number of appliances on the market that can perform virtualization.  Some involve virtualization of the storage in question, and could arguably be classified as array-based, while still others involve the installation of write-splitters on the hosts and could be classified as host-based, but I digress.
  5. The Array - Once the domain of only monolithic storage arrays, today most midrange arrays offer data replication solutions that are sufficient for typical needs.
I believe that all replication solutions can be grouped into one of the five categories above.  Coming up next - Application-based replication.


No comments:

Post a Comment