Tuesday, August 3, 2010

Geeking Out with VAAI

***NOTE: This blog post was edited 10/4/2010 to incorporate changes suggested in the comments.

According to this public HDS announcement there are three primary advantages to the VMware vStorage API for Array Integration (VAAI) delivered with the 0893/B AMS 2000 firmware and vSphere v4.1:
  • Hardware-assisted Locking: Provides an alternative means to protecting the VMFS cluster file system’s meta data
  • Full Copy: Enables the storage arrays to make full copies of data within the array without the VMware vSphere host reading and writing the data
  • Block Zeroing: Enables storage arrays to zero out a large number of blocks to enhance the deployment of large-scale VMs.
Hardware-Assisted Locking
Hardware-assisted locking in the context of VAAI implies that metadata changes to VMFS will be applied serially and atomically to preserve file system integrity. According to this presentation by Ed Walsh of EMC, hardware-assisted locking implements the SCSI Compare-And-Swap (CAS) command. Computer science enthusiasts might enjoy the escapade into the scalability of compare and swap versus test and set methods; if you know a good reference for rusty CS majors, "I have a friend who’d like to know about it."

Prior to the hardware-assisted method, the vSphere host had to acquire a SCSI reservation on the VMFS’ LUN(s); that is, issue a SCSI command to acquire the reservation, potentially wait and retry the command (perhaps multiple times) until the reservation is acquired, issue the SCSI write command to modify the filesystem metadata, and then issue a SCSI command to release the reservation. In the meantime, other hosts may be contending for the same lock even though the metadata changes may be unrelated.

It’s understandable that the SCSI reservation lock mechanism could lead to potential slowdowns in VMFS metadata updates. Typically I would resolve these during the architecture phase by reducing the opportunity for VMFS metadata changes in large clusters (ie, thick provisioning of VMDKs, avoiding use of linked clones, limiting workloads to certain hosts, etc).

Now, vSphere 4.1 hosts can issue a single SCSI command and the array applies the writes atomically. Additionally, the CAS command applies at the block level (not the LUN level) making parallel VMFS updates possible.

Hardware-assisted locking is a huge benefit in meta-data intensive VMFS environments.

So the question arises, what if I have vSphere v4.0 and v4.1 hosts accessing the same VMFS file system? As I understand it, the AMS controller will fail an atomic write issued to a LUN with a SCSI reservation, and vSphere 4.1 will fall back to using SCSI reservations. It’s unclear to me at this point if the vSphere host would automatically attempt to use atomic writes again, or if it would “stick” with SCSI reservations. One can assume it’s probably going to be best practice to only use atomic writes when a cluster only consists of v4.1 hosts.

Full Copy
The actual implementation of “Full Copy” is less clear to me, though obviously the array offloads the copy with little ESX host intervention (ie, I/O read and write requests from the host). Here's a video I put together which demonstrates the full copy and block zeroing features of VAAI. (It may also demonstrate why I shouldn't try to roll my own online demos and/or that I shouldn't quit my day job...)



Block Zeroing
This feature implements the SCSI “write same” instruction, which one might presume writes the same data across a range of logical block addresses.

Prior to this feature, writing 60GB of zeroes required 60GB of writes. I think we can all agree… if the data can be expressed exactly in a single bit, why are we consuming prodigious amounts of I/O to achieve the desired result?

Users of Hitachi Dynamic Provisioning (HDP)... I had hoped the new firmware/vSphere integration would "zero out" blocks automatically when a VMDK is removed from disk so they can be reclaimed using the array’s Zero Page Reclaim (ZPR) function. Unfortunately, my testing shows ZPR does not reclaim space after VMDKs are deleted from VMFS, implying vSphere does not "block zero" the old VMDK. But do not fear, this process is still greatly improved with VAAI.

I used vmkfstools to create an “eagerzeroedthick” VMDK to consume most of the free space in VMFS (vmkfstools -c [size] -d eagerzeroedthick). The operation took 138 seconds (1.27GB/s) in vSphere 4.1. In vSphere 4.0, the exact same operation to the exact same VMFS took 855 seconds (6x longer). As a point of comparison, the dd command in the vSphere service console operated at approximately 44MB/s.

After creating the dummy VMDK, I removed it and performed a ZPR (technically, you can perform the ZPR without deleting the VMDK, but then you might forget the VMDK is there and risk maxing out your VMFS free space very quickly).

Getting the Benefit of VAAI on HDS AMS 2000 Arrays
There are three host group options (two, sir!) which must be set to take advantage of these features (make sure you use the “VMware” platform type). I did not have to reboot or rescan VMFS to get the new features working (though honestly I have no good way of testing the atomic writes).
  • Hardware-Assisted Locking – According to HDS' best practices guide for AMS2000 in vSphere 4.1 environments, no changes are required to use hardware-assisted locking. DO NOT use with firmware versions prior to 0893/B.
  • Unique Extended COPY Mode – This option controls the “Full Copy” copy mode
  • Unique Write Same Mode – This option enables the “Block Zeroing” mode
Other Tidbits
Finally, vSphere storage admins will notice that VMware vSphere 4.1 contains two additional columns in the host storage configuration view:
  • Storage I/O Control – based on the VI Client help description, enabling this makes ESX(i) monitor datastore latency and adjusts the I/O load to the datastore. It uses a “Congestion Threshold” as the upper limit of latency allowed before Storage I/O Control assigns importance to VMs based on their shares.
  • Hardware Acceleration – lists each datastore's support of VAAI as “unknown”, “Not supported”, or “Supported”

Share/Save/Bookmark

8 comments:

  1. What is the status of VAAI on the enterprise versions of HDP?

    ReplyDelete
  2. sruby8, thanks for the comment. As I understand it VAAI for enterprise HDS arrays is currently scheduled for early 2011. We will keep you abreast of any updates.

    ReplyDelete
  3. Are you absolutely sure about setting "Unique Reserve Mode" for hardware assisted locking?

    We upgraded our AMS to 0893/E and our local HDS techs say that we should not enable "Unique Reserve Mode" with vSphere as this settings has nothing to do with VAAI?

    ReplyDelete
  4. Tomi Hakala, thank you for the comment and follow-up. To answer your question I am not absolutely sure as this was one aspect of the firmware I could not empirically test. It appears your local HDS techs may be correct according to HDS’ best practices guide for AMS 2000 in vSphere environments. The guide does confirm the other host group options are required, but states the following about hardware assisted locking: "hardware-assisted locking is always available provided your environment meets the requirements" of ESX 4.1, AMS 2000 microcode, and SNM2. Assuming the best practices guide is correct--that hardware-assisted locking is enabled by default--it’s probably best that customers avoid firmware versions lower than 0893/A.

    ReplyDelete
  5. Marcel, thank you for confirming this.

    ReplyDelete
  6. I performed similar VAAI zerod VMDK tests on a HP 3PAR T400 array, and VAAI increased performace by 20x.

    http://derek858.blogspot.com/2010/12/3par-vaai-write-same-test-results-upto.html

    ReplyDelete
  7. Wow, VAAI integration is huge. Length of time doing snaps/provisioning takes FOREVER compared to my system though. I have a pointer based system, is HDS/AMS not using pointers???

    ReplyDelete
  8. Pity though, that the UNMAP command has to be turned off in vSphere 5 due to performance issues caused by it. When creating/deleting vmdk`s from the vSphere client, the responsetime of our 3PAR F400 array goes up into the 300+ms depending on the size of the disk created. We have UNMAP turned off at the host side now, which sucks because now i have to manually zero out the free diskspace of the datastores when vm`s are deleted or SvMotioned.

    Even more strange is the fact that HP support wasn`t able to figure this out, i ended up resolving the issue myself and pointing them to this fact. As of yet there is no ETA on a fix for this issue.....

    Sources:
    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007427
    http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2009330

    ReplyDelete