Monday, September 27, 2010

Welcoming the Virtual Storage Platform

Hitachi Data Systems (HDS) has just announced their latest flagship storage offering, the Virtual Storage Platform (VSP). As we all have come to expect, speeds and feeds have been improved, and performance of the storage subsystem overall has increased. While I've seen these numbers, until I get a feel for it myself they are just numbers. What makes this platform different from the previous generation subsystems is something that I am pretty excited about; honest to goodness flexibility! Bear with me, this is going to be long, and still not all inclusive.

Previous generation storage systems have generally had similar issues; when you needed more capacity, you needed to add more infrastructure (such as cache boards, back end directors, etc) to support it. This requirement was there regardless of the performance requirements you are experiencing today. Likewise, if you needed more performance for your current workload, you typically had to purchase additional spindles which led to additional infrastructure.

Overview of the VSP

Hitachi's new VSP changes things, a lot. First off let's talk about the hardware. Gone are the non-row conforming frames! The VSP uses from one and up to six standard 19 inch racks and conforms with hot-cold aisles. Cables can be fed in from the top or bottom. Hitachi provides the standard rack pre-configured, although I'll speculate that they will have an option to allow non-HDS racks for colo use. Hopefully you won't miss the expensive three phase power whips because the VSP uses standard L6-30 whips now. A single VSP can consist of two modules. Each module can operate up to 1024 HDDs, for a total of 2048 HDDs in a single subsystem (excluding externally attached storage). The new architecture consists of 2.5 inch SAS HDDs and 3.5 inch SATA HDDs. You can mix and match these, there are two different DKUs (disk frames if you will) and up to three DKUs fit in a 19inch rack.

Back-end architecture

It is no secret that Hitachi is leading the industry with its product line moving to SAS. This has been very successful with the AMS2000 line. The VSP follows suit with a switched 6Gb SAS back-end leaving behind the long in the tooth FC-AL technology.

The next significant change is moving away from dedicated microprocessors on each Front/Back End Director (FED/BED) to centralized processor nodes. There are specialized chips on the FEDs/BEDs, but they service a different role than before. The benefits of this are tremendous. For the first time that I can think of in the Enterprise space, you can start off with a "base" configuration, meaning one BED feature, and drive 1024 2.5 inch SAS drives in a single module VSP. If you need more drives, you can add a second VSP module and BED pair for up to an additional 1024 2.5 inch drives giving you 2048 (internal) drives in a single VSP. There are other components in the mix here, but what I'm trying to point out is that in the old days I added BEDs when I added drives and cabinets in order to provide connectivity to said drives; and that goes away with the new architecture.

The other cool thing is if I have a number of HDDs today, and need more bandwidth to those existing drives, I can add a BED feature, and immediately double back end bandwidth. With the previous FC-AL technology, loops could only service their own drives. With the new SAS switched architecture, any BED link can talk to any drive in within the module. This becomes important if I have a workload that requires very high IOPS, utilizing SSDs and I don't need or want lots of storage capacity.

I am excited about this because out of my wide customer base, I can count several instances where there is plenty of back end processing power for IOPS and bandwidth, but more and more capacity is needed. In many of these cases virtualization became the defacto standard because the added expense of BEDs drove the cost of internal storage too high. Virtualization is a great solution, one that we have had great success with in driving customer adoption, driving costs down for storage, etc, but it's not for everyone's environment. Fortunately the new VSP architecture gives us more options.

More new ways to carve

In a previous post I discussed different ways to carve up storage, and manage it. The Hitachi VSP redefines this again with Hitachi Dynamic Tiering (HDT).

HDT is HDP (Hitachi Dynamic Provisioning), but on a more granular scale. HDP is great, it made storage management easier, faster, and made performance issues more or less a thing of the past by practically eliminating hot spots with wide striping. HDT takes it to a new level by allowing you to TIER your HDP pools with up to three mini-pools of different performance characteristics.

Lets say we start off with an HDP pool of 600GB 10K SAS drives. We have 10 raid groups of 7D+1P, or roughly 38.5TB of storage and about 11,200 raw IOPS. I start shoving data and hosts onto the pool. Keep in mind, that I told my boss(es) that I wanted 300GB drives because I'm concerned about the IOP density of these "massive" drives, but when he saw the price tag he articulated the benefits of 600GB drives to me. Well, that new web 2.0 product that we deployed is having great success, and we're taking on more customers than we expected; hence my IOP requirement for that service is quite a bit higher than I expected. Note my use of the word service here. I have a database, middleware, and web servers all in the mix here. So the answer is buy SSDs. That's great, I get lots of IOPS out of them but which LUNs do I put in there? What if I need a lift for the database and the middleware servers? What if it is a subset of data on the LUN that would benefit but the rest of the data is relatively untouched? Can I afford to waste my expensive SSDs?

HDT makes it easy. I can add a tier of SSD into my pool, which automatically becomes the fast tier, pushing the SAS drives to the lower tier. HDT monitors pages of data (at the 42MB level) and will reallocate the pages to the proper tier based on access patterns. I can add 1TB of SSD, and even though I'm using 500GB LUNs, only the hot data will get moved into the SSD tier.

Fast forward a year, and my web 2.0 product has generated more data than I anticipated. I have to keep that data around in case the user access it, but for the most part, it is stale. With HDT, I can add in some high density SATA storage into my pool, which automatically becomes the lower tier leaving the SAS drives as the middle tier, and SSD as the fast tier; and HDT will start moving pages to SATA freeing up my SAS spindles for more active data.

If we consider that 80% of our IOPS come from 20% of our data, and data growth is out of control, sizing our pools and tiers can be problematic. Fortunately HDT takes the hard work out of our hands and places it into the VSP. We can add/resize/remove tiers from our pools as we our workload and storage demands require it, all seamlessly, and non-disruptively to the data, applications, and end users.

Additional tidbits

The centralized microprocessor nodes give advantages to your FEDs as well as the BEDs that I've discussed already. In the past, when I flipped a port for use in replication or virtualizing storage, I flipped two ports at the same time; as they were serviced by the same MP. With the new architecture, the MPs run all of the microcode; meaning any one of the centralized MPs can service any IO need, whether FED, BED, replication or UVM (universal volume manager). I can actually flip one port, and use just that one for external storage, however at a minimum we should do two; one per cluster.

Another improvement with the centralized MP architecture; when an IO comes in from the host, it is accepted by the MP that is currently servicing that LDEV, so the IO stays with the same MP from front to back reducing latency. Previous generations handed the IO from the FED MP to the BED MP, and back. This was done very efficiently for many generations of the product, however Hitachi found ways to improve on it, and did.

As with the flexibility in how or why you would expand your BED capacity, you can do the same with the microprocessor nodes. If I can service all 1024 HDDs effectively with the processing power I have, I'm good. If I need more processing power, I can add it. The key being that we're not bound to design requirements based on hardware architecture now, we have choices based on performance needs.

If data at rest encryption is important to you, it is just a license away. The BEDs already have the encryption hardware embedded so you don't have to plan maintenance windows for the painful task of swapping out BED pairs. You can manage encryption at the parity group level, but with it being at line speed and no impact to performance I don't know if I would bother.

Final thoughts, for now...

There are few products that excite me. I've been optimistic on some innovative products in the past only to watch them fail to execute and stall out overall. The VSP as a product excites me, it's new, fresh, and takes hardware to a new level. Everyone does this, and someday (insert your favorite manufacturer here) might produce a product that takes the lead. What excites me more is what I can see myself doing with the VSP, Hitachi's vision for the VSP wrapped into a holistic solution, and what it will mean for the front line data center guy who holds the keys to the entire business that he supports each and every day. You will do more with less, and your life will get better from it. Of course, if your boss hasn't read this blog, you can take credit for all of the hard work and push for that promotion....


Monday, September 20, 2010

Data Migrations

There are more ways to shave this yak than I care to write about, but the main methodologies I’ve used time and time again with maximum success and low risk boil down to a select few.
  • Array based technologies such as replication or virtualization
  • Host based LVM copies
  • For VMware, Storage vMotion.
There are emerging technologies that are quite interesting but I’ll go into those at another time. Over the past year I have managed several large-scale data migrations, and I find that the same questions come up each time, specifically:
  1. My database is very large, how are we going to move from array A to array B with minimal downtime?
  2. Downtime is difficult to schedule, what steps can we take to eliminate it?
  3. I have small LUNs today, but as the system keeps growing it becomes onerous to manage. How can we move to larger LUNs?
While Lumenate has a consistent migration process, the answer to these questions is unique to the customer’s environment. This post is meant as a starting point in the process and help you make informed decisions for your technology refresh or data migration.

Let's start with the easiest case first. VMware changes the game in many ways by abstracting physical resources from running guests. Migrations, be it compute or storage resources, can be done with relative ease. The majority of the time outages are not necessary, and services are never interrupted. There are times when storage vMotion (virtual machine migration from one datastore to another datastore) is not possible. This may be due to raw device mappings, full datastores, or other reasons, but these are relatively rare. Outside of these situations Lumenate always recommends storage vMotion. The steps are pretty much the same as above. With proper planning and staging, the migration is typically quick and easily managed from the familiar vSphere client.

Of course not every host is virtualized, so storage vMotion isn't always an option.

Whether structured or unstructured in nature, moving large amounts of data quickly and with minimal service interruption can be difficult. Typically, consistent and repeatable results are found by leveraging array technologies in these scenarios. In cases where I can leverage HDS technology, specifically the enterprise line, virtualization has provided the best results. A typical work flow goes something like this:

  • Determine what LDEVs need migrated, then document, document, document!
  • Initiate connectivity between storage arrays (could be direct attached or through switches)
  • Prepare connectivity between target array and host (build zones, but do not activate)
  • Create new LUNs on the target array
  • If using array-to-array replication, this is a good time to configure and sync pairs.
  • Configure LUN security on the target array
Outage Window
  • Typically plan for a two hour event, but normally takes under an hour
  • Bring applications down, file systems un-mount and as a safety precaution, shut down the host
  • Deactivate old array zones, activate the target array zones
  • If using array virtualization, discover LUNs on target array, and configure LUN security to the host. (Normally, LUNs are presented in same order as found on the old array)
  • Boot host, discover storage and bring services back online
  • If utilizing HDS virtualization, configure Tiered Storage Manager and migrate storage from old array to new array.
Back-out plan
  • If utilizing array to array replication, you will have a point in time copy of the data on your old storage to fall back to
  • In the case of HDS Tiered Storage Manager, if you do not select "wipe data" when building your migration plans, the old data will remain on the old LDEV. It is important to note that TsM doesn't operate like replication, there are no consistency groups. While the data is there it may not be crash consistent like you would expect from traditional replication.
When the business cannot take an outage at all your choices become more limited. Logical volume management at the host level to copy data between arrays tends to be the standard answer in this scenario. There are advantages to this over array based technologies. For starters, we generally avoid an outage requirement as storage can be mapped to modern operating systems while on-line. Additionally you begin to shift the focus of migration related activities more toward the system administrators and off of the storage team; a great benefit if you are the storage lone ranger in your shop. Should you need to resize your volumes or LUNs there is no better way to do it than with LVM (unless you just need to grow them, then most modern arrays can do the job just fine).

  • As above, document the environment
  • Prepare connectivity between target array and host (build zones, but do not activate)
  • Create new LUNs on the target array
  • If using array-to-array replication, this is a good time to configure and sync pairs
  • Configure LUN security on the target array
  • Activate new fabric zones
  • Discover LUNs on the host, format, and bring under LVM control
  • Mirror LUNs
  • Verify Mirror status
  • Break the mirror by removing the old array LUNs. (If you are looking for a consistent point in time “snapshot” of your data on the old array, consider taking applications and file systems off-line for this step.)
  • De-allocate old storage
Back-out plan
  • When breaking the LVM mirrors, consider taking applications and file systems off-line to provide a consistent point-in-time copy of data
  • Keep this extra copy of data for some determined length of time, in case of regression.


Monday, September 13, 2010

An easier way to carve

In the context of external storage on the Hitachi Enterprise array, one of the problems I've encountered in the past is trying to match an arbitrary LDEV size (down to the block count) between my internal and external storage. Traditional modular arrays, including the previous generation Hitachi AMS would not allow you to create LDEVs with an arbitrary block size. They were generally bound at 1GB limits. This has led to many debates on our team to determine the best way to present external storage to the USP or USPV; maintaining the benefit of simplified storage management while engineering against performance issues.

The issue? Creating large LUNs on the external (virtualized) array allows you to easily carve up, or CVS (custom volume size) LDEVs on the USP/USPV. This provides an easier to manage external storage environment but at the expense of granularity. Large external LUNs means a fewer number of LDEVs on the external array, or a high ratio of "internally" created LDEVs to each external LUN. This may help depict the issue:

A 2000 GB volume is virtualized to the USP/USPV, and carved into 20 100GB LUNs. Now, you map those 20 LUNs to various hosts and start pushing I/O. To the hosts, these are typical LUNs, each with their own queue depth (32 in the case of USPV). The host does not understand that on the back end these LUNs all exist on a single LDEV which has a finite queue depth (typically less than 32). I guess you could call it "thin provisioning your queues" but don't, that just gets you in trouble.

Some recent work I have been doing involves the AMS2000 Modular storage virtualized to a USPV. During some basic testing I was able to create LDEVs on the AMS that matched the block size of the USPV LDEVs. Specifically what I'll be doing is taking advantage of the AMS2500s Dynamic Provisioning Software to provide several hundred correctly sized LDEVs to the USPV which will be used as ShadowImage (clone) S-VOLs for internal P-VOLs (DP-VOLs in this case). This is not possible with all externally attached arrays as their block sizes most likely will not match the USP or USP-V block sizes like the AMS2000 array does.

This solution saves my customer money by licensing Hitachi Dynamic Provisioning on their enterprise array for only the storage required by the production volumes. I have simplified management of the external array by leveraging the frame based HDP license included on the AMS product. In addition, I can leverage this infrastructure in the future to provide storage tiering for other applications to further control storage costs. All in all, not a bad day's work!


Monday, September 6, 2010

Architectural Impact of Hitachi SRA

I spent some time recently setting playing with VMware SRM and HDS modular storage. There are some interesting architectural impacts of the HDS Site Recovery Adapter that are worth discussing. Note that these impacts are the same whether you’re dealing with HDS modular or enterprise storage.

At the heart of the issue is this: The HDS Site Recovery Adapter talks to HDS Command and Control Interface (CCI), and CCI needs access to a command device on the array.

Do you run SRM on the same server (physical or virtual) as vCenter or not? If vCenter is virtualized, it’s not really a hard question. Provided you have the means to provision the CCI command device via iSCSI, then SRM can co-exist with vCenter. If you can’t do that....then SRM will be living elsewhere.

Personally, I don’t believe that SRM generally warrants its own physical hardware, particularly with the expense of FibreChannel ports and HBAs. But you might be able to co-locate SRM on another existing physical box (may I suggest the physical server housing your vCenter database?). My general preference is to see SRM running on a dedicated VM. To work with the HDS SRA, the CCI command device must be mapped to the physical ESX host and configured as a physical-mode RDM. This means you will be unable to vMotion your SRM machine. Shouldn’t be a show-stopper, just something to be aware of. You can still boot the SRM guest on any ESX node in the cluster to which the command device is mapped (be sure to keep LUN numbering consistent across all ESX nodes that potentially boot the SRM machine).

Recommendation Matrix
SRM on vCenter serverSRM on dedicated server
SRM virtualizedProvision the command device via iSCSI to an initiator in the vCenter guest.Provision command device via FibreChannel to the ESX host running the SRM guest. The command device must be a physical-mode RDM to the guest. This blocks vMotion of the SRM guest
SRM physicalProvision command device via FibreChannel to the SRM/vCenter serverProvision command device via FibreChannel to the SRM server.
Generally overkill.

There are some other supported ways of solving this problem, including a separate Linux CCI server. But there are a lot of caveats to the other approaches, and I’m not comfortable recommending them for production use.


Friday, September 3, 2010

Bluearc Silicon FS 7.0

Bluearc Recently announced the availability of the 7.0 version of their Silicon File System that includes a number of enhancements.

Here are some of the highlights:

Multi Tier File System (MTFS)
This feature allows you to separate out the metadata and associate it with high performance SAS or SSD devices. Locating the metadata on high performance disk can significantly improve performance across a number of operations and workloads.

• Bluearc solutions tend to be used in environments with very high file counts, in some cases millions of files exist within a single directory and a file system may have billions of files. Operations such as replication, backups and migrations all rely heavily on metadata. Using the Multi Tier File System greatly improves the performance of these types of operations. Directory listings are improved by as much as 500%. Replication operations have shown improvements as high as 87%.

• Traditional workloads also benefit from MTFS. Heavy write workloads, processes that scan large parts of a file system or require concurrent access to large files have shown performance improvements between 30% and 50%

MTFS will improve performance in most environments but just how much benefit you see will depends on your data and the types of operations you perform. The nice thing is that this approach is built directly into the solution, you simply add the tier of storage from the existing arrays you have rather than adding a completely new device.

Something to consider is that the amount of file system space consumed by metadata depends on the average file size. The smaller the file size the greater the percentage of space consumed by metadata. Bluearc provides some guidelines that can help with sizing the space required for the metadata.

Increased Scalability
Capacity has been doubled across the entire product line for all of you that were struggling with the 8PB limit.

Mercury 50
HDS 3080
Mercury 100
HDS 3090
Titan 3100
HNAS 3100
Titan 3200
HNAS 3200
4 PB8 PB8 PB16 PB

Data Migration Improvements

One of the nicest Bluearc features is Data Migrator. Data Migrator provides policy based data migration between tiers of storage. When combined with External Volume Links (XVL) it can also migrate to any NFS device, whether it is another NAS solution, a deduplication appliance or a file server. In previous versions of Data Migrator files are not automatically migrated back to their original location when recalled. You can move them back but it requires a separate operation through the CLI. This reverse migration is now built in making the overall process much simpler.

Improved Reporting
The reporting features have been enhanced to provide more of a dashboard type view for a cluster node. This view is helpful for getting a quick look at how a node is performing. File System capacity trending information is now included. File system reporting is particularly useful in environments with large file systems and millions or billions of files. The performance monitoring statistics are now stored in a database allowing longer retention periods. They’ve also added a links that allow you to quickly switch between various sample sets such as the last hour, 1 day, 1 week, 3 months even 1 year.

We were told that 7.0 is GA but not yet shipping on the units. Version 7.1 should be out within the next few weeks and will be included on new units shortly thereafter. When we get the 7.0 code in our lab we will be sure to post our findings as well as what the upgrade process looks like.