Monday, July 18, 2011

Hitachi Dynamic Provisioning (HDP) in practice

We've talked about HDP on the blog a few times before (here and here, for example).  And with the advent of the VSP, we've moved into a world where all LUN provisioning from HDS arrays should be done using HDP.

In brief, HDP brings three different things to the table:

  1. Wide striping - data from each LUN in an HDP pool is evenly distributed across the drives in the pool.
  2. Thin provisioning - space is only consumed from an HDP pool when data is written from the host.  In addition, through Zero Page Reclaim (ZPR), you can recover unused capacity.
  3. Faster allocation - In a non-HDP environment there were two options.  You could either have predetermined LUN sizes and format the array ahead of time, or you could create custom LUNs on-demand and wait for the format.  With HDP you are able to create custom-sized LUNs and begin using them immediately.
Most of our customers move to HDP as part of an array refresh.  Whether it's going from an AMS 1000 to an AMS 2500 or a USP-V to a VSP, they get the benefits of both HDP and newer technology.  While this is great from an overall performance perspective it makes it difficult to quantify how much of the performance gain is from HDP vs. how much is from using newer hardware.

We do have one customer with a USP-VM that moved from non-HDP over to HDP, though, and I thought it was worth sharing a couple of performance metrics both pre- and post-HDP.  Full disclosure - the HDP pool does have more disks than the traditional layout did (80 vs. 64), and we added 8 GB of data cache as well.  So, it's not apples-to-apples, but is as close as I've been able to get.

First we have Parity Group utilization:

As you can see, back-end utilization completely changed on 12/26 when we did the cut over.  Prior to the move to HDP parity group utilization was uneven, with groups 1-1 and 1-3 being especially busy.  After the move utilization across the groups is even and the average utilization is greatly reduced.

Second we have Write Pending - this metric represents data in cache that needs to be written to disk:

Here you see results similar to the parity group utilization.  From cutover on 12/26 until 1/9 write pending is basically negligible.  From 1/9 to 1/16 there was monthly processing, corresponding to the peak from 12/12 to 12/19 in the previous month, but as you can see write pending is greatly reduced.  

The peak in write pending between 12/19 and 12/26 is due to the migration from non-HDP volumes to HDP volumes.  In this case we were also changing LUN sizes, and used VERITAS Volume Manager to perform that piece of the migration.

The difference pre- and post-HDP is compelling, especially when you consider that it's the same workload against the same array.  If you're on an array that doesn't support wide striping, or if you're just not using it today then there's an opportunity to "do more with less."



  1. beautiful. great post Chadwick! this is what I love about HDP. If only there was some type of de-dupe now the VSP would be the ultimate BAMF storage array.

  2. Great Work. We just did a migration of that nature on a USP-VM and the results were Tremendous. Now all our environments are HDP pooled.

  3. Hum just a question about that,
    I mean under your hdp pool you still have raid array.
    So even if you make stripe with hdp pool you still have the raid penalty.

    So actually your hdp pool for me seems like a software raid0 behind hardware raid.
    Actually i'm pretty sure i can make a comparison between an hdpool and a raid 50 for exemple and have the same performance result no?

  4. Anonymous,

    Good point - just a couple of comments. It would be more accurate to say that you're doing raid 0 in front of (or on top of) RAID-5, since HDP is on top of the traditional RAID group configuration.

    Also, the HDP mechanism makes it possible to spread multiple hosts across a shared pool of drives, something that is non-trivial with software RAID.

    Prior to the advent of HDP we would present devices from the RAID groups in a round-robin fashion, and then use VERITAS Volume Manager to stripe across them, trying to achieve a similar result ("plaiding"). It worked, but had to be maintained over time by moving subdisks around at the host level as you grew the array, managing hot spots, and so on. It also required planning the number of columns in the software RAID, since (at the time at least) you had to grow in even increments.

    Finally, with regards to the RAID penalty, you can't compare software RAID-5 (without a write cache) against hardware RAID-5 (with a write cache). Having sufficient cache allows you to mitigate the read/modify/write penalty (depending on workload, cache size, etc.).

    I think that there's still an argument to be made for host-side volume managers, in that they allow you to abstract file systems from the underlying devices - but when it comes to leveling out device utilization in a storage array I think that the array-based wide striping is compelling.

    1. Hey Chadwick, nice post. I am just curious as to what software you used to collect and graph that data.

      Also, on a slightly related note, do you know of a program that can be used to document the AMS? I cannot believe there is not an automated way to document a SAN but I have yet to find it. I really want to get away from manually maintaining an Excel spreadsheet!

      And finally, what would you recommend for performance data collection and analysis? I'm using NetApp's Balance product right now. It's ok but the reporting leaves a lot to be desired.


    2. Dan,

      The data was collected using the Performance Monitor tool - you can get a look at it through Storage Navigator, or export it to .csv files using a java program provided by HDS. In this case I exported it and graphed it in Excel. Documentation is availabled in the Product Documentation Library (PDL) for the array, and you should be able to get the Export Tool from your HDS CE (it's dependent on the microcode version).

      To automatically document the AMS I'd probably use the SNM2 CLI - There's a pretty good list on here: Alternatively you can rely on Device Manager - just make sure that everyone in your shop either uses it exclusively (or refreshes after any changes).

      I haven't found any one tool that does everything well for performance data collection and analysis - it's a pipe dream to try to create that tool someday. If you're an HDS shop then TuningManager gets a ton of data, but it can be challenging to get exactly the report you're looking for. Over the years we've looked at several toolsets, but they tend to be either vendor specific or end up getting bought and shelved.