Monday, April 20, 2009

Hitachi Dynamic Provisioning - reclaim free/deleted file space in your pool today.

While Hitachi Dynamic Provisioning (HDP) is new to HDS storage; isn’t a new concept by any stretch of the imagination. That said, you can rest assured that when the engineering gurus at Hitachi released the technology for consumer use, you’re data can be put there safely and with confidence. There are many benefits to HDP, you can google search for a couple hours and read about them, but the few that catch my eye are:

1) Simplify storage provisioning by creating volumes that meet the user’s requirements without the typical backend storage planning on how you’ll meet the IO requirements of the volume – HDP will let you stripe that data across as many parity groups as you add to the pool.

2) Investment savings. Your DBA or application admin wants 4 TB of storage presented to his new server because he ‘wants to build it right the first time’ but 4TB is a 3 year projection of storage consumption and he’ll only use about 800GB this year. Great, provision that 4TB, and keep an eye on your pool usage. Buy the storage you need today, and grow it as needed tomorrow delaying the acquisition of storage as long as possible. This affords you the benefit of time, as storage typically gets a little cheaper every few months. Don’t be fooled though, as a storage admin, you need to watch our pool consumption and ensure you add storage to the pool before you hit 100%. (I’d personally want to have budget set aside for storage purchases so that I don’t have to fight purchasing or finance guys when my pool hits the 75-80% mark.

3) Maximize storage utilization by virtualizing storage into pools vs. manually manipulating parity group utilization and hot spots. This has been a pain point since the beginning of SANs, and as drives get bigger hot spots become more prevalent. I have seen cases where customers have purchased 4x the amount of storage required to meet the IO requirements of specific applications just to minimize or eliminate hot spots. Spindle count will still play a role in HDP but the method used to lay the data down to disk should provide a noticeable reduction in number of spindles required to meet those same IO requirements.

4) If you are replicating data, the use of HDP will reduce the amount of data and time required for the initial sync. If you’re HDP volumes are 50GB, but only 10% utilized, replication will only move the used blocks, so you’re not pushing empty space across your network. This can be a significant savings in time when performing initial syncs or full re-syncs.

The next issue for HDP is reclaiming space from the DP-VOLs when you have an application that writes and deletes large amounts of data. This could be a file server, a log server, or whatever. Today, there is no ‘official’ way to recover space from your DP-VOLs once that space becomes allocated from the pool. I have tested a process that will allow you to recover space from your DP-VOLs after deleting files from the volume. I came up with this concept by looking at how you shrink virtual disks in VMware. Before vmware tools were around, you could do a couple commands to manually shrink a virtual disk. This is what I used for my solaris and linux guests:

  1. Run cat /dev/zero > zero.fill;sync;sleep 1;sync;rm -f zero.fill (inside the VM)
  2. Run vmware-vdiskmanager.exe -k sol10-disk-0.vmkd

Basically what happens in most file systems is when a file is deleted, only the pointer to the files is removed, the blocks of data are not over written until the space is needed by new files. By creating a large file full of zeros, we overwrite all empty blocks with zeros, allowing the vmware-vdiskmanager program to find and remove zero data from the virtual disk.

You can accomplish the same thing in Windows using SDELETE –c x: (x = the drive you want to shrink). SDELETE can be found on Microsoft’s website (, as well as usage instructions.

Now that we know how this works in VMware, how does this help us with HDP? As of firmware 60-04 for the USP-V / VM, there is an option called “discard zero data” which tells the array to scan DP-VOL(s) for un-used space (42MB chunks of storage that has nothing but zeros in it). These chunks are then de-allocate from the DP-VOL and returned to the pool as free space. This was implemented to enable users to migrate from THICK provisioned LUNs to THIN provisioned LUNs. We can take things a step further by writing zeros to our DP-VOLs, to clean out that deleted data then run the “discard zero data” command from Storage Navigator effectively accomplishing the same thing we did above in VMware. In my case, HDP has allocated 91% of my 50GB lun, but I deleted 15GB of data so I should be far less than 91%. Running the “discard zero data” accomplishes nothing, as the deleted data has not been cleaned yet. Now I run the sdelete command (as this was a Windows 2003 host), in this case SDELETE –c e: which wrote a big file of zeros and then deleted it. Now, back in Storage Navigator I check my volume, and I’m at 100% allocated! What happened? Simply, I wrote zeros to my disk, zeros are data, and HDP has to allocate the storage to the volume. Now we perform the “discard zero data” again and my DP-VOL is now at 68% allocated, which is what we were shooting for.

The most important thing to keep in mind if you decide to do this is writing zeros to a DP-VOL will use pool space!! I wouldn’t suggest doing this to every LUN in your pool at the same time, unless you have enough physical storage in the back end to support all of the DP-VOLs you have. While the process will help you reclaim space after deleting files, it is manual in nature. Perhaps we’ll see Hitachi provide an automated mechanism for this in the future, but its nice to know we can do it today with a little bit of keyboard action.


Monday, April 6, 2009

Data Domain - Appliance or Gateway?

Customers looking to purchase a Data Domain solution often ask whether or not they should buy an appliance or a gateway. We recently spoke about this with the technical folks at Data Domain and here is what we found.

From a feature functionality standpoint the appliance and gateway models are the same. There is no difference in terms of scale or performance. Both the 690 gateway and the 690 appliance deliver up to 2.7 TB /hr of throughput and offers data protection capacities up to 1.7 PB. The lack of difference in performance is a bit surprising since the appliance uses software based RAID, which of course is offloaded to the external array in a gateway solution. One would expect that the gateway approach would perform better since the internal processors would be free to handle other tasks such as IO operations. However this doesn’t appear to be the case.

The biggest difference then is in terms of the overall experience with the solution. In an appliance based solution Data Domain is responsible for all of the code and software updates. The appliance interacts directly with the disk and Data Domain has tweaked the drivers for optimum performance and included mechanisms to ensure reliability. In the gateway approach, the Data Domain interaction stops at the HBA. The result is that the overall user experience is now dependent on not just Data Domain but the external array. In order to ensure that the experience is positive Data Domain provides best practices for using external arrays, including a list of supported arrays, microcode levels and recommended configurations.

Once deployed it simply means that the administrative group will have more items to consider. Compatibility will need to be verified before microcode upgrades on either the array or the Data Domain. If the external array is to be shared with other hosts care must be taken to ensure that these hosts do not impact the Data Domain environment.

At the end of the day, if you already own a supported array that meets the requirements then using a gateway will likely save you some money, otherwise it probably makes sense to stick with an appliance.