Storage Meat: September 2009

Tuesday, September 29, 2009

Problems encapsulating root on Solaris with Volume Manager 5.0 MP3

This post is as much for my own reference as for the world at large, but having spent entirely too much time working on this I just had to post it.

As part of a larger project we are installing Storage Foundation and using Veritas Volume Manager (VxVM) to encapsulate and then mirror the root disks. It's a well-known process so it was particularly frustrating to find that the systems either would not complete the encapsulation or would come up referencing rootvol for the root device, but without the configuration daemon (vxconfigd) running.

We could force the configuration daemon to start but no matter what we tried the system just would not start vxconfigd at boot. Searches on SunSolve and Symantec's support site didn't turn anything up until we started digging through the startup scripts and found that one of them was running this:

vxdctl settz

Plugging that into google leads here, which basically says that they added this in MP3 and if the system's TZ environment variable isn't set then it dumps core and doesn't start vxconfigd.

Sure enough, there was no entry for TZ in /etc/TIMEZONE, and adding one fixed it.

And yes, I know that there's a much easier (and elegant) way to handle this with the combination of zfs and beadm, but maybe we'll talk about that later.

Tuesday, September 15, 2009

Because It's There

One of my favorite things about Lumenate is that we have a lab where we can test out new technology, stage for customer engagements, and perhaps most importantly indulge our inner geek impulse to try ill-advised stuff.

This story does not begin, though, with the ill-advised stuff. Instead it begins with fully supported stuff. As we bring new folks into the fold the lab is a key resource for teaching them - before they go to class they can kick the tires on zoning, mapping storage, virtualizing storage, etc.

And so the plan was to show one of the new folks how to virtualize an AMS2100 behind a USP-VM. Easy as pie, it should work like a hose, let me show you this quick virtualization demo. Except it didn't work. Instead of popping up with "HITACHI" for the Vendor, and "AMS" for the Product Name we got the dreaded "OTHER" "OTHER." That's not exactly the demo we were hoping for.

We double-checked everything and still no luck. Well, maybe it's the release of code on the AMS, let's upgrade that. Nope. Hmm, how about the code on the USP-VM. Nope. Eventually we went back and noticed that someone (who for the purposes of this blog shall remain nameless) changed the Product Id on the AMS from the default, which is "DF600F" to "AMS2100". Changing it back fixed the problem.

After basking in the success of accomplishing something that would normally take just a few minutes, we thought to ourselves, "Hmm, what else have we got that we could virtualize?" And because it's a lab, and because there are no repercussions (and because Justin loves him some ZFS) we decided to virtualize a Sun X4500, or "Thumper".

I won't cover the steps for setting up an OpenSolaris box as a storage server, since it's well documented under COMSTAR Administration on the Sun wiki. But basically we followed the documented steps and presented the LUNs up to the USP-VM. And, as you'd expect, got "OTHER" "OTHER" for our trouble.

And that's where the "Bad Idea" tag comes in. You see, it's possible to load profiles for new arrays into Universal Volume manager. We took the profile for a Sun 6780, modified it to match the COMSTAR information, and loaded it up, to get this:

After that, it virtualized without issue and we presented it up to a Solaris server to do some basic dd's against it. As far as the host knew, it was just another LUN from the USP-VM.

Of course this is just one way to do it. After a little more thought maybe you could recompile parts of OpenSolaris (like this one, for instance) and have the OpenSolaris server tell the USP-V that it's something else entirely. We'll leave that as an exercise for the reader, though.

Let me reiterate: This is a Really Bad Idea for a production array because it's not supported (or supportable).

Friday, September 4, 2009

Backing up that large Oracle database with Netbackup’s Snapshot client

5000A great thing happens when you marry-up some large storage to a large database, you can deliver a large solution. You can also give yourself a large backup headache. Let’s use Oracle for an example. One of the typical ways administrators are getting their large databases backed is by creating an Oracle backup policy which sends an RMAN stream (a humongous one) over the network. The performance of this method can often be measured with a calendar. Another method involves provisioning out a large locally attached disk volume and sending the RMAN job to the local disk that later gets backed up by a standard file system policy which, you guessed it goes over the aforementioned network. Now if your database resides on a storage array with some flavor of in-system replication then another way is to script your way through the problem. This involves writing a script(s) that manipulates the database into backup mode, leverages a sync (or resync) function on your primary and mirror disk volumes and then mounts the mirrors up on a backup server. My company has made some pretty nice scratch doing this and it does work well. Until that is your storage administrator has decided to change job fields and go into farming (it’s happened) and the database administrator is a contractor who may or may not have a firm grasp on the nuances of RMAN which by now may have you in a mild panic trying to figure how to get it all working again. There are other methods of course but using my own polling (unofficial of course); these seem to be the running favorites.

Symantec’s Veritas Netbackup folks have taken up the challenge of backing up very large databases (VLDB) leveraging some advanced storage array replication features and it’s called Snapshot Client (formerly Advanced Client). What is it and how does it work? With the alternate client backup feature, all backup processing is offloaded to another server (or client). This off-loads the work to an alternate system significantly reducing computing overhead on the primary client. The alternate host handles the backup I/O processing so the backup process has little or no impact on the primary client. A NetBackup master server is connected by means of a local or wide-area network to target client hosts and a media server. The primary NetBackup client contains the data to be backed up. A snapshot of that data is created on the alternate client (or media server). The alternate client creates a backup image from the snapshot, using original path names and streams the image to the media server. Trivia question – Does Oracle have to be installed on the media server (alternate client)? The answer is No. The snapshot client will call RMAN using the same Netbackup wrapper script it has always used but leverages Oracle’s remote proxy copy RMAN option. I’ve included an example with the proxy option here for you:

RUN {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
BACKUP
PROXY
SKIP INACCESSIBLE
TAG hot_db_bk_proxy
# recommended format
FORMAT 'bk_%s_%p_%t'
DATABASE;
sql 'alter system archive log current';
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
# backup all archive logs
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
BACKUP
FILESPERSET 20
FORMAT 'al_%s_%p_%t'
ARCHIVELOG ALL DELETE INPUT;
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
ALLOCATE CHANNEL FOR MAINTENANCE TYPE SBT_TAPE;
CROSSCHECK BACKUPSET;
DELETE NOPROMPT EXPIRED BACKUPSET;
RELEASE CHANNEL;

Now before running off down the hall yelling I am delivered! I am required by professional ethics (and common sense) to mention a few caveats. This method does eliminate all the custom scripting which invariably becomes a hindrance when you start having staff turnover or when something breaks. The snapshot method replaces it with commercial off the shelf (COTS) software so when it breaks a call to support is your lifeline. The snapshot client does not however eliminate the need for staff trained in multiple disciplines (DBA, Storage, Backup) to make it work. Sorry the days of “take the default, click next, click next” are not right around the corner for VLDB’s.

My primary product mix experience involving the snapshot client is with Hitachi ShadowImage, Oracle and Netbackup Enterprise Server. The actual product support matrix is quite extensive so for those EMC, IBM and NetApp users out there you now have some options.