Monday, April 20, 2009

Hitachi Dynamic Provisioning - reclaim free/deleted file space in your pool today.

While Hitachi Dynamic Provisioning (HDP) is new to HDS storage; isn’t a new concept by any stretch of the imagination. That said, you can rest assured that when the engineering gurus at Hitachi released the technology for consumer use, you’re data can be put there safely and with confidence. There are many benefits to HDP, you can google search for a couple hours and read about them, but the few that catch my eye are:

1) Simplify storage provisioning by creating volumes that meet the user’s requirements without the typical backend storage planning on how you’ll meet the IO requirements of the volume – HDP will let you stripe that data across as many parity groups as you add to the pool.

2) Investment savings. Your DBA or application admin wants 4 TB of storage presented to his new server because he ‘wants to build it right the first time’ but 4TB is a 3 year projection of storage consumption and he’ll only use about 800GB this year. Great, provision that 4TB, and keep an eye on your pool usage. Buy the storage you need today, and grow it as needed tomorrow delaying the acquisition of storage as long as possible. This affords you the benefit of time, as storage typically gets a little cheaper every few months. Don’t be fooled though, as a storage admin, you need to watch our pool consumption and ensure you add storage to the pool before you hit 100%. (I’d personally want to have budget set aside for storage purchases so that I don’t have to fight purchasing or finance guys when my pool hits the 75-80% mark.

3) Maximize storage utilization by virtualizing storage into pools vs. manually manipulating parity group utilization and hot spots. This has been a pain point since the beginning of SANs, and as drives get bigger hot spots become more prevalent. I have seen cases where customers have purchased 4x the amount of storage required to meet the IO requirements of specific applications just to minimize or eliminate hot spots. Spindle count will still play a role in HDP but the method used to lay the data down to disk should provide a noticeable reduction in number of spindles required to meet those same IO requirements.

4) If you are replicating data, the use of HDP will reduce the amount of data and time required for the initial sync. If you’re HDP volumes are 50GB, but only 10% utilized, replication will only move the used blocks, so you’re not pushing empty space across your network. This can be a significant savings in time when performing initial syncs or full re-syncs.

The next issue for HDP is reclaiming space from the DP-VOLs when you have an application that writes and deletes large amounts of data. This could be a file server, a log server, or whatever. Today, there is no ‘official’ way to recover space from your DP-VOLs once that space becomes allocated from the pool. I have tested a process that will allow you to recover space from your DP-VOLs after deleting files from the volume. I came up with this concept by looking at how you shrink virtual disks in VMware. Before vmware tools were around, you could do a couple commands to manually shrink a virtual disk. This is what I used for my solaris and linux guests:

  1. Run cat /dev/zero > zero.fill;sync;sleep 1;sync;rm -f zero.fill (inside the VM)
  2. Run vmware-vdiskmanager.exe -k sol10-disk-0.vmkd

Basically what happens in most file systems is when a file is deleted, only the pointer to the files is removed, the blocks of data are not over written until the space is needed by new files. By creating a large file full of zeros, we overwrite all empty blocks with zeros, allowing the vmware-vdiskmanager program to find and remove zero data from the virtual disk.

You can accomplish the same thing in Windows using SDELETE –c x: (x = the drive you want to shrink). SDELETE can be found on Microsoft’s website (, as well as usage instructions.

Now that we know how this works in VMware, how does this help us with HDP? As of firmware 60-04 for the USP-V / VM, there is an option called “discard zero data” which tells the array to scan DP-VOL(s) for un-used space (42MB chunks of storage that has nothing but zeros in it). These chunks are then de-allocate from the DP-VOL and returned to the pool as free space. This was implemented to enable users to migrate from THICK provisioned LUNs to THIN provisioned LUNs. We can take things a step further by writing zeros to our DP-VOLs, to clean out that deleted data then run the “discard zero data” command from Storage Navigator effectively accomplishing the same thing we did above in VMware. In my case, HDP has allocated 91% of my 50GB lun, but I deleted 15GB of data so I should be far less than 91%. Running the “discard zero data” accomplishes nothing, as the deleted data has not been cleaned yet. Now I run the sdelete command (as this was a Windows 2003 host), in this case SDELETE –c e: which wrote a big file of zeros and then deleted it. Now, back in Storage Navigator I check my volume, and I’m at 100% allocated! What happened? Simply, I wrote zeros to my disk, zeros are data, and HDP has to allocate the storage to the volume. Now we perform the “discard zero data” again and my DP-VOL is now at 68% allocated, which is what we were shooting for.

The most important thing to keep in mind if you decide to do this is writing zeros to a DP-VOL will use pool space!! I wouldn’t suggest doing this to every LUN in your pool at the same time, unless you have enough physical storage in the back end to support all of the DP-VOLs you have. While the process will help you reclaim space after deleting files, it is manual in nature. Perhaps we’ll see Hitachi provide an automated mechanism for this in the future, but its nice to know we can do it today with a little bit of keyboard action.


1 comment:

  1. Justin,

    Another important thing to consider (which is implicit in the above) is that when you use either SDELETE or the unix method you will fill the file system. Any applications using the file system that try to write may fail - so you need to either not fill all of the free space or do this during a quiet time.

    Neat, though. I actually used SDELETE while copying a laptop hard drive and passing it through gzip - took a 120GB HD down to about a 30GB file.