Monday, November 1, 2010

VMware Storage – NFS, Fibre Channel or ISCSI

This is the first in what will be a series of posts on storage considerations for VMware. This first post discusses the practicality of using NFS storage as an alternative to block based storage in virtualized environments. I have no axe to grind or specific product to push, my company represents leading storage solutions for both NAS and SAN.

In the past if someone had said to me that they wanted to use something other than Fibre Channel for their VMware environment I had one of two thoughts. “I guess you can’t afford Fibre Channel or this must be a small environment”. With regard to ISCSI, I In most cases though certainly not all, it is used in smaller environments. ISCSI is less expensive than Fibre Channel and in many cases it meets the requirements of these organizations. Larger environments with more demanding workloads and availability requirements tend to use Fibre Channel.

As for NFS, until recently I never gave it much thought as a solution for VMware. Fibre Channel is tried and true, its high performing, reliable and extremely scalable so why would I look at NFS? As it turns out there are a lot of reasons why you might look at NFS.

The first thing to point out is that there are a number of misconceptions about what is and what is not supported with VMware and NFS. As a starting point here is the latest compatibility information.

Feature

Fibre Channel

ISCSI

NFS

ESX Boot

Yes

Hardware Initiator

No

VM Boot

Yes

Yes

Yes

RDM

Yes

Yes

NA

VMotion

Yes

Yes

Yes

Storage VMotion

Yes

Yes

Yes

VMware HA

Yes

Yes

Yes

Fault Tolerance

Yes

Yes

Yes


As you can see, with the exception of booting the ESX servers, NFS is supported for all of the VMware features.

The next concern people typically have is around performance because as we all know Fibre Channel SANs are fast and NFS is slow, right? Well it’s actually a bit more complicated than that.

In general virtualized server environments generate a lot of small block IO, meaning that the IOPS capabilities of the storage device are much more important than the Megabytes per Second. The IOPS capabilities of a storage device are driven more by the underlying physical storage than whether or not it is accessed via Fibre Channel or NFS. It should also be clear that not all NAS solutions are the same, some are designed to provide high performance and some are not, but if you compare a Fibre Channel Array with a NAS array with the same number and type of backend disk you can achieve about the same number of IOPS.

This conclusion is not necessarily easy to reach since each storage manufacturer, whether SAN or NAS has a vested interest in proving that their solution is the fastest. You will see arguments that point to the additional load that NFS puts on the client, others will argue that this is offset against the overhead of VMFS. In my view which is the fastest is less relevant than whether or not they provide sufficient performance for your needs. In subsequent posts I will explore performance in more detail but for now I will just say that NAS solutions are available that meet the performance requirements of even very large virtualized environments.

So for arguments sake let’s say you don’t intend to boot ESX from external storage and that either a Fibre Channel or NAS based solution will meet your performance requirements - we will further assume that both solutions fit within your budget. Why might you consider NAS?

The primary reason is ease of management.

NAS proponents have always pointed out how much easier it is to manage compared to Fibre Channel. As someone that works with Fibre Channel on a regular basis I don’t look at it as being complex. In fact I used to joke that I could train a monkey to map storage. Then again a colleague of mine once famously said “I’m no monkey!” after mapping an active LUN to a new host.

In truth I’ve always recognized that NAS is easier to provision and administer but what wasn’t so apparent is the significance this can play in large VMware environments. I say large because this is where most of the benefits of NFS show up. Fibre Channel Solutions are managed at the LUN level, whereas NAS solutions are managed at the file system level. Obviously this is a bit oversimplified but from the standpoint of what is provisioned to the virtual server environment this is accurate.

The key difference here is the number of devices that need to be provisioned and managed. In VMware environments the maximum LUN size is 2TB. In practice most organizations elect to use LUNs that are much smaller because of potential performance issues that arise as more VMs are stored on a single LUN. With more advanced arrays using an active / active architecture, large storage pools and increased support for VAAI this may change and 2TB LUNs in virtualized environments may become more common. To keep it simple we will stick with 2TB LUNs for our comparison.

In the case of NFS you are limited in terms of the total datastores you can create (64 in Vsphere 4.x) but VMware itself has no limitation on the capacity of the NFS datastore – this is driven by the NAS solution. Many NAS solutions now have the ability to create single file systems that are hundreds of terabytes in size. So if your capacity requirements are 100TB you would need to create at least 50 LUNs in a SAN solution, while you could do this with a single File System with NFS. In reality you would probably use a clustered NAS solution and allocate 2 file systems so that the IO could be distributed between them, but even 2 versus 50 is a significant difference.

If you intend to use array based snapshots or replication, a block based solution can become even more complex. Block based snapshots and replication are done at the LUN level, so each LUN that you want to replicate needs to have a corresponding target LUN. Snapshot implementations vary by manufacturer but in most cases you will need to define a Logical LUN or V-VOL in HDS terminology for each snapshot you wish to take. If you want to take 3 snapshots a day on all 50 LUNs that’s 150 V-Vols and V-Vol relationships you need to manage.

By contrast in an NFS environment replication and snapshots can be defined at the file system level, again you define the policies and relationships on two file systems rather than 50 LUNs.

Beyond just the ease of configuration and initial setup a file system based approach can provide better granularity and the ability to organize your snapshot and replication policies around your requirements rather than the constraints of LUN boundaries.

For example many NAS based solutions allow you to define replication and snapshot policies at the directory level. Using this approach you can logically organize your VM data files within the appropriate directories and apply policies to one, multiple or all VMs.

Finally the process of actually recovering from a snapshot is much simpler on NAS. In a NAS environment snapshots are typically accessible via a directory off of the file system or another mount point. In either case you simply need to access the directory / mount point and select the files you want to recover.

In a block based environment it can be more complicated. The snapshot volume must be accessible to whatever hosts will access it. Just like a traditional volume you must have the appropriate zoning, mapping and security in place. The snapshot volume must be discovered from within vSphere before you can access the data files stored within. For most organizations this means that the storage administration team will be involved whereas a NAS approach provides more of a self service model to the VMware team.

Summary

In no way am I suggesting that you shouldn’t use SAN, including Fibre Channel and ISCSI for virtualized server environments. SAN is proven to work well in virtualized environments; it’s highly reliable, provides excellent performance and can scale to meet any capacity requirement. VMware continues to improve VMFS and the vendor community is integrating new features specific to VMware into their storage technologies as rapidly as they can. VMware still tends to add new functionality to Fibre Channel storage first.

What I am suggesting is that NFS is a viable alternative with many things in its favor including ease of management and more flexible snapshot and replication capabilities. I’m a block guy, that is what I am used to working with but even I have to admit that simplifying the storage portion of virtualized server environments is an attractive option.


Share/Save/Bookmark

1 comment:

  1. Very good in depth input.
    If you have created a revised blog post for this, please add a link to a newer version since this is now 5 years old.

    ReplyDelete