Wednesday, August 29, 2012

Learning to love uncertainty

Historically storage administration was a relatively straightforward endeavor. You could empirically measure how much capacity you needed, project anticipated growth, and plan accordingly. Likewise you could measure an application's existing IO requirements, extrapolate based on planned growth, and feel comfortable that you probably got the sizing right. Sure sometimes it felt more like art than science, but at the end of the day you could point to an Excel spreadsheet and some fudge factor you used to do your math.
Lately, though, I find myself saying "it depends" even more than usual. It began with deduplicating backup appliances. While their initial claims seemed ludicrous (Whadya mean you can store 10x the actual physical space?), when you took the amount of redundancy inherent in backups the math made a lot of sense. The challenge of course is trying to find a way to effectively communicate to a customer and size appropriately for their environment.

The next place it started cropping up was in the WAN. Through a combination of techniques, including deduplication, it is possible to increase the effective bandwidth of a link. With our focus on disaster recovery these solutions are very attractive because the overriding cost in a solution is the recurring cost of bandwidth. Again, though, short of an actual benchmark it is impossible to quantify the impact of these devices. (Although I know at least one vendor will let you borrow the devices, expecting that you'll call them back to write a check once you realize the difference the devices make).

And now it's common in online storage as well. Not every vendor offers deduplication of primary storage, but most do offer the automatic migration of "hot" storage to different tiers. When sized appropriately (meaning that all of the "hot" data fits into the highest tier) the performance improvement is astounding. Unfortunately if the highest tier is undersized then the performance degradation is also astounding.

Here are a couple of decision points to consider:

  • Predictability - For some customers it is more important that performance is predictable than that it is as fast as possible. I'm not suggesting that they don't want better performance, rather that given the choice between running a theoretical job in 30 minutes every time or having it run between 10 minutes and 30 minutes depending on other workload they would chose the scenario with a dependable 30 minute run time. 
  • Partition-ability - I like to joke that devices don't understand politics. All of these technologies (deduplication, storage tiering, etc.) will improve performance in the aggregate. Unfortunately the aggregate may include the media collection of Jill in accounting. It's not enough to have the technology - you have to have the tools to use it properly, and enough insight into the environment to architect its use as well. 
We live in interesting times - hope you're enjoying them.

(For more on storage performance and sizing, please see Terry's entries here, here, here here, here, here, and here). Or, just search on Storage Performance Concepts

No comments:

Post a Comment