Tuesday, December 29, 2009

If the answer's recompiling sendmail, then you're asking the wrong question

Another entry from the random files. Google eventually helped me get this one but it wasn't the first entry, the first page, or even explicit.

A client mentioned that they were having sendmail problems, and later asked if I'd be willing to take a look. Now I have a love/hate relationship with sendmail. At some level it's always been a sort of SysAdmin proving ground. I still have my copy of the "bat" book, but we won't talk about which edition.

Anyway, this particular configuration handed off mail to procmail for processing, but procmail never got started. You could see the queue filling up (to the point that an ls on the directory wouldn't complete) but nothing we tried would get mail moving.

There are two other nodes that serve the same purpose that were working, so we compared them to the broken node. There were many configuration differences, including a different version of sendmail. The two working nodes had a custom compiled version instead of the vendor-supplied version. Assuming that was the problem, I went through the trouble to compile sendmail.

Only to have the same problem.

Digging in, we ran /usr/lib/sendmail -q -v, which processes the queue verbosely. There was an error message "(alias database unavailable)" for every message. Aha! It must be a permissions problem. But comparing permissions (and running "newaliases") shows that it's not.

Which led us to check /etc/nsswitch.conf and see that aliases was referencing the nisplus database. But this host isn't using nisplus.

Changing the aliases entry to just "files" fixed the problem.


Tuesday, September 29, 2009

Problems encapsulating root on Solaris with Volume Manager 5.0 MP3

This post is as much for my own reference as for the world at large, but having spent entirely too much time working on this I just had to post it.

As part of a larger project we are installing Storage Foundation and using Veritas Volume Manager (VxVM) to encapsulate and then mirror the root disks. It's a well-known process so it was particularly frustrating to find that the systems either would not complete the encapsulation or would come up referencing rootvol for the root device, but without the configuration daemon (vxconfigd) running.

We could force the configuration daemon to start but no matter what we tried the system just would not start vxconfigd at boot. Searches on SunSolve and Symantec's support site didn't turn anything up until we started digging through the startup scripts and found that one of them was running this:

vxdctl settz

Plugging that into google leads here, which basically says that they added this in MP3 and if the system's TZ environment variable isn't set then it dumps core and doesn't start vxconfigd.

Sure enough, there was no entry for TZ in /etc/TIMEZONE, and adding one fixed it.

And yes, I know that there's a much easier (and elegant) way to handle this with the combination of zfs and beadm, but maybe we'll talk about that later.


Tuesday, September 15, 2009

Because It's There

One of my favorite things about Lumenate is that we have a lab where we can test out new technology, stage for customer engagements, and perhaps most importantly indulge our inner geek impulse to try ill-advised stuff.

This story does not begin, though, with the ill-advised stuff. Instead it begins with fully supported stuff. As we bring new folks into the fold the lab is a key resource for teaching them - before they go to class they can kick the tires on zoning, mapping storage, virtualizing storage, etc.

And so the plan was to show one of the new folks how to virtualize an AMS2100 behind a USP-VM. Easy as pie, it should work like a hose, let me show you this quick virtualization demo. Except it didn't work. Instead of popping up with "HITACHI" for the Vendor, and "AMS" for the Product Name we got the dreaded "OTHER" "OTHER." That's not exactly the demo we were hoping for.

We double-checked everything and still no luck. Well, maybe it's the release of code on the AMS, let's upgrade that. Nope. Hmm, how about the code on the USP-VM. Nope. Eventually we went back and noticed that someone (who for the purposes of this blog shall remain nameless) changed the Product Id on the AMS from the default, which is "DF600F" to "AMS2100". Changing it back fixed the problem.

After basking in the success of accomplishing something that would normally take just a few minutes, we thought to ourselves, "Hmm, what else have we got that we could virtualize?" And because it's a lab, and because there are no repercussions (and because Justin loves him some ZFS) we decided to virtualize a Sun X4500, or "Thumper".

I won't cover the steps for setting up an OpenSolaris box as a storage server, since it's well documented under COMSTAR Administration on the Sun wiki. But basically we followed the documented steps and presented the LUNs up to the USP-VM. And, as you'd expect, got "OTHER" "OTHER" for our trouble.

And that's where the "Bad Idea" tag comes in. You see, it's possible to load profiles for new arrays into Universal Volume manager. We took the profile for a Sun 6780, modified it to match the COMSTAR information, and loaded it up, to get this:

After that, it virtualized without issue and we presented it up to a Solaris server to do some basic dd's against it. As far as the host knew, it was just another LUN from the USP-VM.

Of course this is just one way to do it. After a little more thought maybe you could recompile parts of OpenSolaris (like this one, for instance) and have the OpenSolaris server tell the USP-V that it's something else entirely. We'll leave that as an exercise for the reader, though.

Let me reiterate: This is a Really Bad Idea for a production array because it's not supported (or supportable).


Friday, September 4, 2009

Backing up that large Oracle database with Netbackup’s Snapshot client

5000A great thing happens when you marry-up some large storage to a large database, you can deliver a large solution. You can also give yourself a large backup headache. Let’s use Oracle for an example. One of the typical ways administrators are getting their large databases backed is by creating an Oracle backup policy which sends an RMAN stream (a humongous one) over the network. The performance of this method can often be measured with a calendar. Another method involves provisioning out a large locally attached disk volume and sending the RMAN job to the local disk that later gets backed up by a standard file system policy which, you guessed it goes over the aforementioned network. Now if your database resides on a storage array with some flavor of in-system replication then another way is to script your way through the problem. This involves writing a script(s) that manipulates the database into backup mode, leverages a sync (or resync) function on your primary and mirror disk volumes and then mounts the mirrors up on a backup server. My company has made some pretty nice scratch doing this and it does work well. Until that is your storage administrator has decided to change job fields and go into farming (it’s happened) and the database administrator is a contractor who may or may not have a firm grasp on the nuances of RMAN which by now may have you in a mild panic trying to figure how to get it all working again. There are other methods of course but using my own polling (unofficial of course); these seem to be the running favorites.

Symantec’s Veritas Netbackup folks have taken up the challenge of backing up very large databases (VLDB) leveraging some advanced storage array replication features and it’s called Snapshot Client (formerly Advanced Client). What is it and how does it work? With the alternate client backup feature, all backup processing is offloaded to another server (or client). This off-loads the work to an alternate system significantly reducing computing overhead on the primary client. The alternate host handles the backup I/O processing so the backup process has little or no impact on the primary client. A NetBackup master server is connected by means of a local or wide-area network to target client hosts and a media server. The primary NetBackup client contains the data to be backed up. A snapshot of that data is created on the alternate client (or media server). The alternate client creates a backup image from the snapshot, using original path names and streams the image to the media server. Trivia question – Does Oracle have to be installed on the media server (alternate client)? The answer is No. The snapshot client will call RMAN using the same Netbackup wrapper script it has always used but leverages Oracle’s remote proxy copy RMAN option. I’ve included an example with the proxy option here for you:

TAG hot_db_bk_proxy
# recommended format
FORMAT 'bk_%s_%p_%t'
sql 'alter system archive log current';
# backup all archive logs
FORMAT 'al_%s_%p_%t'

Now before running off down the hall yelling I am delivered! I am required by professional ethics (and common sense) to mention a few caveats. This method does eliminate all the custom scripting which invariably becomes a hindrance when you start having staff turnover or when something breaks. The snapshot method replaces it with commercial off the shelf (COTS) software so when it breaks a call to support is your lifeline. The snapshot client does not however eliminate the need for staff trained in multiple disciplines (DBA, Storage, Backup) to make it work. Sorry the days of “take the default, click next, click next” are not right around the corner for VLDB’s.

My primary product mix experience involving the snapshot client is with Hitachi ShadowImage, Oracle and Netbackup Enterprise Server. The actual product support matrix is quite extensive so for those EMC, IBM and NetApp users out there you now have some options.


Saturday, June 27, 2009

Hitachi High Availability Manager – USP V Clustering

When HDS first introduced virtualization on the USP it had an immediate impact on the ability to perform migrations between dissimilar storage arrays. I no longer had to implement third party devices or host based software to perform these migrations. Now I could simply virtualize the array and move the data either to internal storage or another virtualized array. The process was simple and straightforward. The ease of migration and the benefits of virtualization soon led to very large virtualized environments many with hundreds of terabytes and hundreds of hosts.

The challenge came years later when it was time to migrate from the USP to a USP V. How do you move all of the virtualized storage and all of those hosts to the USP V with minimal disruption. Unfortunately there wasn’t a good answer. No matter what combination array based tools I used; replication, virtualization, Tiered Storage Manager, I would still have an outage and would need a significant amount of target capacity. It wasn’t terrible but it wasn’t great either and required a lot of planning and attention to detail.

Hitachi High Availability Manager solves this problem, at least on the USP V and VM by allowing you to cluster USP V arrays. Virtualized storage can be attached to both arrays in the cluster and the hosts can be nondisruptively migrated to the front end ports on the new array. This is a feature HDS has been promising for some time and it is finally here. HDS has made a pretty good case that migrations have been a major problem for large organizations and I would agree we see it every day. So as it applies to migrations High Availability Manager was a much needed improvement. But what about clustering USP Vs for increased availability.

At first glance it is a little more difficult to see the benefits of this configuration. HDS Enterprise storage is designed for 100% uptime, the guarantee is right there on a single page. Unlike athletes, storage arrays can’t give you “110%”. How do you tell a potential customer “The USP V is designed for 100% uptime, it will never go down… but you might want to cluster them”?

But, if you start to think about it there may be some reasons to do this. First of all if you look at where this feature is initially targeted it is at the fortune 100 and environments with the absolute, most extreme availability requirements. Although a single USP V is rock solid and the chances of an outage are very minimal there is the possibility. Clustering USP Vs just adds one more layer of protection. It also silences critics that routinely point to the controller based virtualization solution as a single point of failure.

Consider maintenance operations. Anytime this is done you introduce the human element, “Oops, I pulled two cache boards!” Now you can unclench your jaw and relax while these operations are performed.

The other thing to consider is that clustering is apparently not an all or nothing proposition. I can elect to cluster only the ports and data associated with certain critical applications. (According to Claus Mikkelsen) Let’s assume that an environment already has one or more USP V arrays, you could cluster just the mission critical applications. This seems to me to be much more feasible than simply pairing entire arrays.

From my view I don’t see the ability to cluster the arrays as having the same impact as the introduction of virtualization did, it isn’t as revolutionary but rather it is another incremental step that furthers Hitachi’s lead in the enterprise storage space.

Now for the question of the day, how soon will they have support for host based clusters? If my applications are that critical they’re probably clustered. Even if I don’t intend to run production with a clustered pair of USP Vs it sure would be nice to be able to migrate them nondisruptively!

Thursday, June 25, 2009

If Banking Were Like Storage

I imagine most people would be upset if they found out their U.S. bank statements were actually quoted in Canadian dollars. (As I write this the exchange rate is 1.15 CAD to each USD). What looks like $100 in accrued interest turns out to be under $86 when you spend it (in the US).

Likewise, I imagine many consumers of storage wonder what happens to all of those terabytes when they “spend” them. What is the difference between “raw” storage (ie, the bank statement) and “usable” storage (that is, how much you have when you spend it)?

From my perspective, if we’re not talking about storage capacity in terms of how we spend it, we’re having the wrong discussion. Yet everyday, storage consumers accept quotes with storage capacities they'll never see.

Obviously, storage capacity is not the sole determining factor when specifying the Need. In most cases performance requirements, replication requirements, backup windows, data usage methods and patterns, and perhaps most importantly the intrinsic value of the data all contribute to the total solution.

This blog series will meander through these topics over the coming weeks.

Sunday, May 3, 2009

Real world goodness with ZFS

Not long ago I was working on a customer environment where they wanted a NetBackup hardware refresh. To save time, I won't go into details, but the end result was a Solaris (SPARC) media server powered by a T5220 (32GB ram, 8 core) and a full complement of dual port 4Gb HBAs. The storage was a ST6540 with about 200TB of 1TB SATA drives and configured in RAID5. The media server was given (60) 1.6TB LUNs. The initial configuration was three ZFS pools, each with 20 LUNs (no ZFS redundancy, meaning 32TB RAID0 stripes). Not the most optimal in resiliency, true; but this is a back-up server and we understood the risk level; we wanted capacity and were willing to risk losing a pool should RAID5 fail us.

We fired off a series of backup jobs, and were immediately able to run about 5 to 6x the number of jobs that were previously possible on the old hardware. The T5xx0 series systems absolutely SCREAM as NetBackup media servers due to the I/O capabilities of the hardware, and the threading ability of the CPU, they can outperform the M5000 and do it at a fraction of the cost. But on day two or three, CRASH! The Media server goes down hard, and requires a power cycle. Upon rebooting (took about 3 minutes) we found that our paths to storage went down and ZFS did what it was supposed to (PANIC the box). Well, this goes on for a week or so, and the customer decides they want to try Veritas Storage Foundation to see if that resolves it, thinking that this is a ZFS issue because "ZFS is causing the PANIC". So fast forward a couple days running on Veritas, and CRASH! It happens again. This time however, it takes about 6 hours to FSCK the 100TB of storage before we can get the host back online and start backups again. This gets more fun, because the crashes become more often and after a week we are spending about 18 hrs a day rebooting and doing FSCK, and practically getting NO backups.

Turns out, the ST6540 had controller issues, and it was resolved after a couple swap outs, some parity group rebuilds (because the controller swap had its own issues). But the takeaway I got from this is in an unstable environment the true power of ZFS shines even if you don't take advantage of RAIDZ, or any of the other plentiful features. Why would you run anything else, especially for a non-clustered file system requirement? Really, I'm asking! Maybe I'm missing something. It's free, its OPEN, and if it's the default for the new MACOS, it can't be bad (this coming from a Solaris/Linux/Windows at home guy too).

Monday, April 20, 2009

Hitachi Dynamic Provisioning - reclaim free/deleted file space in your pool today.

While Hitachi Dynamic Provisioning (HDP) is new to HDS storage; isn’t a new concept by any stretch of the imagination. That said, you can rest assured that when the engineering gurus at Hitachi released the technology for consumer use, you’re data can be put there safely and with confidence. There are many benefits to HDP, you can google search for a couple hours and read about them, but the few that catch my eye are:

1) Simplify storage provisioning by creating volumes that meet the user’s requirements without the typical backend storage planning on how you’ll meet the IO requirements of the volume – HDP will let you stripe that data across as many parity groups as you add to the pool.

2) Investment savings. Your DBA or application admin wants 4 TB of storage presented to his new server because he ‘wants to build it right the first time’ but 4TB is a 3 year projection of storage consumption and he’ll only use about 800GB this year. Great, provision that 4TB, and keep an eye on your pool usage. Buy the storage you need today, and grow it as needed tomorrow delaying the acquisition of storage as long as possible. This affords you the benefit of time, as storage typically gets a little cheaper every few months. Don’t be fooled though, as a storage admin, you need to watch our pool consumption and ensure you add storage to the pool before you hit 100%. (I’d personally want to have budget set aside for storage purchases so that I don’t have to fight purchasing or finance guys when my pool hits the 75-80% mark.

3) Maximize storage utilization by virtualizing storage into pools vs. manually manipulating parity group utilization and hot spots. This has been a pain point since the beginning of SANs, and as drives get bigger hot spots become more prevalent. I have seen cases where customers have purchased 4x the amount of storage required to meet the IO requirements of specific applications just to minimize or eliminate hot spots. Spindle count will still play a role in HDP but the method used to lay the data down to disk should provide a noticeable reduction in number of spindles required to meet those same IO requirements.

4) If you are replicating data, the use of HDP will reduce the amount of data and time required for the initial sync. If you’re HDP volumes are 50GB, but only 10% utilized, replication will only move the used blocks, so you’re not pushing empty space across your network. This can be a significant savings in time when performing initial syncs or full re-syncs.

The next issue for HDP is reclaiming space from the DP-VOLs when you have an application that writes and deletes large amounts of data. This could be a file server, a log server, or whatever. Today, there is no ‘official’ way to recover space from your DP-VOLs once that space becomes allocated from the pool. I have tested a process that will allow you to recover space from your DP-VOLs after deleting files from the volume. I came up with this concept by looking at how you shrink virtual disks in VMware. Before vmware tools were around, you could do a couple commands to manually shrink a virtual disk. This is what I used for my solaris and linux guests:

  1. Run cat /dev/zero > zero.fill;sync;sleep 1;sync;rm -f zero.fill (inside the VM)
  2. Run vmware-vdiskmanager.exe -k sol10-disk-0.vmkd

Basically what happens in most file systems is when a file is deleted, only the pointer to the files is removed, the blocks of data are not over written until the space is needed by new files. By creating a large file full of zeros, we overwrite all empty blocks with zeros, allowing the vmware-vdiskmanager program to find and remove zero data from the virtual disk.

You can accomplish the same thing in Windows using SDELETE –c x: (x = the drive you want to shrink). SDELETE can be found on Microsoft’s website (http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx), as well as usage instructions.

Now that we know how this works in VMware, how does this help us with HDP? As of firmware 60-04 for the USP-V / VM, there is an option called “discard zero data” which tells the array to scan DP-VOL(s) for un-used space (42MB chunks of storage that has nothing but zeros in it). These chunks are then de-allocate from the DP-VOL and returned to the pool as free space. This was implemented to enable users to migrate from THICK provisioned LUNs to THIN provisioned LUNs. We can take things a step further by writing zeros to our DP-VOLs, to clean out that deleted data then run the “discard zero data” command from Storage Navigator effectively accomplishing the same thing we did above in VMware. In my case, HDP has allocated 91% of my 50GB lun, but I deleted 15GB of data so I should be far less than 91%. Running the “discard zero data” accomplishes nothing, as the deleted data has not been cleaned yet. Now I run the sdelete command (as this was a Windows 2003 host), in this case SDELETE –c e: which wrote a big file of zeros and then deleted it. Now, back in Storage Navigator I check my volume, and I’m at 100% allocated! What happened? Simply, I wrote zeros to my disk, zeros are data, and HDP has to allocate the storage to the volume. Now we perform the “discard zero data” again and my DP-VOL is now at 68% allocated, which is what we were shooting for.

The most important thing to keep in mind if you decide to do this is writing zeros to a DP-VOL will use pool space!! I wouldn’t suggest doing this to every LUN in your pool at the same time, unless you have enough physical storage in the back end to support all of the DP-VOLs you have. While the process will help you reclaim space after deleting files, it is manual in nature. Perhaps we’ll see Hitachi provide an automated mechanism for this in the future, but its nice to know we can do it today with a little bit of keyboard action.


Monday, April 6, 2009

Data Domain - Appliance or Gateway?

Customers looking to purchase a Data Domain solution often ask whether or not they should buy an appliance or a gateway. We recently spoke about this with the technical folks at Data Domain and here is what we found.

From a feature functionality standpoint the appliance and gateway models are the same. There is no difference in terms of scale or performance. Both the 690 gateway and the 690 appliance deliver up to 2.7 TB /hr of throughput and offers data protection capacities up to 1.7 PB. The lack of difference in performance is a bit surprising since the appliance uses software based RAID, which of course is offloaded to the external array in a gateway solution. One would expect that the gateway approach would perform better since the internal processors would be free to handle other tasks such as IO operations. However this doesn’t appear to be the case.

The biggest difference then is in terms of the overall experience with the solution. In an appliance based solution Data Domain is responsible for all of the code and software updates. The appliance interacts directly with the disk and Data Domain has tweaked the drivers for optimum performance and included mechanisms to ensure reliability. In the gateway approach, the Data Domain interaction stops at the HBA. The result is that the overall user experience is now dependent on not just Data Domain but the external array. In order to ensure that the experience is positive Data Domain provides best practices for using external arrays, including a list of supported arrays, microcode levels and recommended configurations.

Once deployed it simply means that the administrative group will have more items to consider. Compatibility will need to be verified before microcode upgrades on either the array or the Data Domain. If the external array is to be shared with other hosts care must be taken to ensure that these hosts do not impact the Data Domain environment.

At the end of the day, if you already own a supported array that meets the requirements then using a gateway will likely save you some money, otherwise it probably makes sense to stick with an appliance.

Thursday, March 12, 2009

Lumenate holds first Customer Storage Forum

Lumenate held our first ever storage customer forum on February 19th at Dave and Busters. By all accounts it was an excellent event and we are planning to have a follow up soon. For those of you that were unable to attend, here is a little information about the customer storage forum.
The forum is an opportunity for Lumenate customers and prospects to get together to discuss data management and storage technologies. This is not a sales event, no vendors are present and we are not pushing any products or services, we simply want to hear directly from you about the challenges you face and the different ways you are handling them. We also want to give our customers the chance to interact, to find out how their peers are using different technologies and approaches to solve issues. Here are some of the questions we posed at the event and the feedback we received.

1. What is your biggest storage challenge?
a. 17% Capacity Planning
b. 50% Lowering Costs
c. 33% Managing Unstructured Data

2. What are your biggest backup challenges?
a. 67% Not Completing fast enough
b. 33% no backup challenges

3. How do you handle Disaster Recovery from a data standpoint?
a. 33% Tape Only
b. 33% Replication Only
c. 33% Combination of replication and tape

4. What impact has compliance had on your environment?
a. 50% Minor impact
b. 50% No impact

5. How much are you concerned about power, cooling, space and other environmental issues?
a. 20% Major Problem
b. 60% Moderate Concern
c. 20% No Issues

6. How many tiers of storage do you have in your environment?
a. 60% 2
b. 20% 3
c. 20% 4

7. What technologies do you believe will have the biggest impact on your storage environment in the next 12 t o18 months?
a. 67% Ethernet based storage, ISCSI / FCoE
b. 33% Open Storage

Wednesday, February 18, 2009

The Lumenate Storage Practice Blog – off we go!

Lumenate has been working with customers to solve data management and storage challenges since the company was founded but we did not implement a formal storage practice until a year ago. The reason we created the storage practice was so that we could be more strategic in our approach to storage solutions and thereby be of better service to our customers.

Here are some of the things we do:

Talk directly with customers about their storage and data management challenges. Everyone claims to sell solutions but what they generally mean is they sell bundles, you put some hardware together with some software, add in services and voila you have a solution. This is not our approach, while we offer products and services, our engagement starts with understanding what you are trying to accomplish and ends with a solution designed specifically to meet your needs.

We research the latest industry thoughts, trends and technologies. We want to understand not just where we are today but where we are going tomorrow.

We evaluate technologies and develop our position on them. Companies looking to sign up Lumenate as a reseller are often surprised with the depth to which we go before we give a new technology the green light. Having a great market and stack of leads is fine but we want to ensure that we understand the technology at an intimate level and are comfortable positioning it in our accounts.

We take all of this research and data points and educate our technical team, members our sales team and our customers.

Typically this information is shared in the form of white papers, positioning docs or presentations but there are a number of challenges with this approach.

1. This type of communication is one way, we push it out but getting feedback directly from customers and prospects is difficult.

2. Timeliness, creating white papers and presentation materials takes time and of course it needs to be scrubbed and polished before it is made generally available to the public. In some cases this delays getting the information to our customers that could really use it.

3. Finally, some of the most interesting and useful things we uncover come about through conversations the technical team has internally - when we are trying to solve a problem or develop a solution. We wanted a way to capture this information and share it.

All of these challenges have led us to the idea of creating a blog. We’re not sure just yet what will come of it, or just how interested people will be but our