Monday, August 29, 2011

Working with NetBackup CTIME Timestamps in Excel

NetBackup makes extensive use of timestamps for backup job start and stop times, expiration dates, etc.  NetBackup appends a timestamp to the client name to create a backup id (myserver_1303936891).  Don’t know about you, but I don’t natively understand CTIME timestamps.
To convert between the timestamp and “human readable” time, Symantec provides the bpdbm command with the -ctime flag to convert from NetBackup’s timestamp to the current time zone.  For example:
bash-3.00# bpdbm -ctime 1303936891
1303936891 = Wed Apr 27 15:41:31 2011
Some of my favorite NetBackup reports output job start/stop times and other data using CTIME timestamps:
  • bpmedialist shows the expiration date for tapes
  • bpimagelist shows the start and stop times for backup images in the catalog, as well as image expiration dates
It’s just not feasible to run bpdbm on every timestamp for output that may contain 10K lines.  Excel makes a far better tool.
To convert from CTIME to “Excel” time you need the following information:
  1. The NetBackup timestamp (the number of seconds since 1/1/1970, in GMT)
  2. The number of seconds per day (86,400)
  3. The Excel serial number for 1/1/1970 (25,569)
  4. Your current timezone offset in hours (for US/Central this is currently -5)
Excel date/time stamps are real numbers where the integer portion represents the number of days since 1/1/1900 (where 1/1/1900 is day 1).  The decimal portion of an Excel date/time value represents the portion of a single day (ie, 0.5 is 12 hours).  Therefore, to convert from CTIME to Excel time use the following formula:
timestamp/86400+25569+(-5/24)

Share/Save/Bookmark

Monday, August 22, 2011

How to Mount Cloned Volume Groups in AIX

Today, most SAN storage vendors provide some kind of volume or LUN cloning capability. The name and underlying mechanics for each vendor differ, but the end result is pretty much the same. They take a primary volume or LUN and create an exact copy at some point in time. NetApp's name for this technology is FlexClone.

Typically, creating a clone of a LUN and mounting the file system on the original server is a trivial process. The process becomes more complex if volume management is involved. Server based volume management software provides many benefits, but complicates matters where LUN clones are used. In the case of IBM's Logical Volume Management (LVM), mounting clones on the same server results in duplicate volume group information. Luckily, AIX allows LVM to have duplicate physical volume IDs (PVID) for a "short period" of time without crashing the system. Not sure exactly what a "short period" of time equates too, but in my testing I didn't experience a crash.

The process to "import" a cloned volume group for the first time is disruptive in that the original volume group must be exported. It is necessary to have the original volume group exported so that the physical volume IDs (PVIDs) on the cloned LUNs can be regenerated. The recreatevg command is used to generate new PVIDs and to rename the volume names in the cloned volume group. Note that the /etc/filesystem entries need to be manually updated because the recreatevg command prepends /fs to the original mount point names for the clones. Once the /etc/filesystem file is updated, the original volume group can be re-imported with importvg.

Subsequent refreshes of previously imported clones can be accomplished without exporting the original because ODM remembers the previous PVID to hdisk# association. It does not reread the actual PVID from the disk until an operation is performed against the volume group. The recreatevg command will change the PVIDs and volume names on the cloned volume group without affecting the source volume group.

Process for initial import of cloned volume group:


  1. Clone the LUNs comprising the volume group
    1. Make sure to clone in a consistent state
  2. Unmount and export original volume groups
    1. Use df to associate file systems to volumes
    2. Unmount file systems
    3. Use lsvg to list the volume groups
    4. Issue varoffvg to each affected volume group
    5. Use lspv to view the PVIDs for each disk associated with the volume groups
    6. Remember the volume group names and which disks belong to each VG that will be exported
    7. Use varyoffvg to offline each VG
    8. Use exportvg to export the VGs
  3. Bring in the new VG
    1. Execute cfgmgr to discover new disks
    2. Use lspv to identify the duplicate PVIDs
    3. Execute recreatevg on each new VG listing all disks associated with the volume group and –y option to name the VG
    4. Use lspv to verify no duplicate PVIDs
  4. Import the original volume groups
    1. Execute importvg with the name of one member hdisk and the –y option with the original name
    2. Mount the original file systems.
  5. Mount the cloned file systems
    1. Make mount point directories for the cloned file systems
    2. Edit /etc/filesystems to update the mount points for the cloned VG file systems
    3. Use mount command to mount the cloned file systems
The subsequent import of a cloned volume group differs in that only the cloned volume group needs to be unmounted and varied offline prior to the clone refresh. Remember the hdisk numbers involved in each clone volume group that is to be refreshed. Once refreshed use exportvg to export the volume group. Afterward, the recreatevg command is issued naming each hdisk associated with the volume group and its previous name. Now the volumes and file systems are available. Prior to mounting, the /etc/filesystem entries need to be updated to correct the mount points.

Process to refresh cloned volume group:

  1. Unmount and vary off the cloned volume groups to be refreshed
    1. Execute umount on associated file systems
    2. Use varyoffvg to offline each target VG
  2. Refresh the clones on the storage system
  3. Bring in the refreshed clone VGs
    1. Execute cfgmgr
      1. Use lspv and notice that ODM remembers the hdisk/PVID and volume group associations
    2. Use exportvg to export the VGs noting the hdisk numbers for each VG
    3. Execute recreatevg on each refreshed VG naming all disks associated with the volume group and –y option to name the VG to its original name
    4. Now lspv displays new unique PVIDs for each hdisk
  4. Mounting the refreshed clone file systems
    1. Edit /etc/filesystem to correct the mount points for each volume
    2. Issue mount command to mount the refreshed clones
See the example below for a first time import of two cloned volume groups, logvg2 and datavg2, consisting of 2 and 4 disks respectively:



bash-3.00# df

Filesystem 512-blocks Free %Used Iused %Iused Mounted on

/dev/hd4 1048576 594456 44% 13034 17% /

/dev/hd2 20971520 5376744 75% 49070 8% /usr

/dev/hd9var 2097152 689152 68% 11373 13% /var

/dev/hd3 2097152 1919664 9% 455 1% /tmp

/dev/hd1 1048576 42032 96% 631 12% /home

/dev/hd11admin 524288 523488 1% 5 1% /admin

/proc - - - - - /proc

/dev/hd10opt 4194304 3453936 18% 9152 3% /opt

/dev/livedump 524288 523552 1% 4 1% /var/adm/ras/livedump

/dev/pocdbbacklv 626524160 578596720 8% 8 1% /proddbback

/dev/fspoclv 1254359040 1033501496 18% 2064 1% /cl3data

/dev/fspocdbloglv 206438400 193491536 7% 110 1% /cl3logs

/dev/poclv 1254359040 1033501480 18% 2064 1% /proddb

/dev/pocdbloglv 206438400 193158824 7% 115 1% /proddblog

/dev/datalv2 836239360 615477152 27% 2064 1% /datatest2

/dev/loglv2 208404480 195088848 7% 118 1% /logtest2

bash-3.00#

bash-3.00$ umount /datatest2/

bash-3.00# umount /logtest2/

bash-3.00# lsvg

rootvg

pocdbbackvg

dataclvg

logsclvg

pocvg

pocdblogvg

datavg2

logvg2

bash-3.00# varyoffvg datavg2



NOTE: remember the hdisk and vg names for the exported vg's.



bash-3.00# lspv

hdisk0 00f62aa942cec382 rootvg active

hdisk1 none None

hdisk2 00f62aa997091888 pocvg active

hdisk3 00f62aa9a608de30 dataclvg active

hdisk4 00f62aa9a60970fc logsclvg active

hdisk10 00f62aa9972063c0 pocdblogvg active

hdisk11 00f62aa997435bfa pocdbbackvg active

hdisk5 00f62aa9a6798a0c datavg2

hdisk6 00f62aa9a6798acf datavg2

hdisk7 00f62aa9a6798b86 datavg2

hdisk8 00f62aa9a6798c36 datavg2

hdisk9 00f62aa9a67d6c9c logvg2 active

hdisk12 00f62aa9a67d6d51 logvg2 active

bash-3.00# varyoffvg logvg2

bash-3.00# lsvg

rootvg

pocdbbackvg

dataclvg

logsclvg

pocvg

pocdblogvg

datavg2

logvg2

bash-3.00# exportvg datavg2

bash-3.00# exportvg logvg2

bash-3.00#

bash-3.00# exportvg datavg2

bash-3.00# exportvg logvg2

bash-3.00# cfgmgr

bash-3.00# lspv

hdisk0 00f62aa942cec382 rootvg active

hdisk1 none None

hdisk2 00f62aa997091888 pocvg active

hdisk3 00f62aa9a608de30 dataclvg active

hdisk4 00f62aa9a60970fc logsclvg active

hdisk10 00f62aa9972063c0 pocdblogvg active

hdisk11 00f62aa997435bfa pocdbbackvg active

hdisk5 00f62aa9a6798a0c None

hdisk6 00f62aa9a6798acf None

hdisk7 00f62aa9a6798b86 None

hdisk8 00f62aa9a6798c36 None

hdisk13 00f62aa9a6798a0c None

hdisk14 00f62aa9a6798acf None

hdisk15 00f62aa9a6798b86 None

hdisk9 00f62aa9a67d6c9c None

hdisk12 00f62aa9a67d6d51 None

hdisk16 00f62aa9a6798c36 None

hdisk17 00f62aa9a67d6c9c None

hdisk18 00f62aa9a67d6d51 None

bash-3.00#



Notice the duplicate PVIDs. Use the recreatevg command naming all of the new disks in each volume group of the newly mapped clones.



bash-3.00# recreatevg -y dataclvg2 hdisk13 hdisk14 hdisk15 hdisk16

dataclvg2

bash-3.00# recreatevg -y logclvg2 hdisk17 hdisk18

logclvg2

bash-3.00# importvg -y datavg2 hdisk5

datavg2

bash-3.00# importvg -y logvg2 hdisk9

logvg2

bash-3.00# lspv

hdisk0 00f62aa942cec382 rootvg active

hdisk1 none None

hdisk2 00f62aa997091888 pocvg active

hdisk3 00f62aa9a608de30 dataclvg active

hdisk4 00f62aa9a60970fc logsclvg active

hdisk10 00f62aa9972063c0 pocdblogvg active

hdisk11 00f62aa997435bfa pocdbbackvg active

hdisk5 00f62aa9a6798a0c datavg2 active

hdisk6 00f62aa9a6798acf datavg2 active

hdisk7 00f62aa9a6798b86 datavg2 active

hdisk8 00f62aa9a6798c36 datavg2 active

hdisk13 00f62aa9c63a5ec2 dataclvg2 active

hdisk14 00f62aa9c63a5f9b dataclvg2 active

hdisk15 00f62aa9c63a6070 dataclvg2 active

hdisk9 00f62aa9a67d6c9c logvg2 active

hdisk12 00f62aa9a67d6d51 logvg2 active

hdisk16 00f62aa9c63a6150 dataclvg2 active

hdisk17 00f62aa9c63bf6b2 logclvg2 active

hdisk18 00f62aa9c63bf784 logclvg2 active

bash-3.00#



Notice the PVID numbers are all unique now.

remount original file systems



bash-3.00# mount /datatest2

bash-3.00# mount /logtest2

bash-3.00#



create new mount points and edit /etc/filesystems



bash-3.00# mkdir /dataclone1test2

bash-3.00# mkdir /logclone1test2

bash-3.00# cat /etc/filesystems





/fs/datatest2:

dev = /dev/fsdatalv2

vfs = jfs2

log = /dev/fsloglv03

mount = true

check = false

options = rw

account = false



/fs/logtest2:

dev = /dev/fsloglv2

vfs = jfs2

log = /dev/fsloglv04

mount = true

check = false

options = rw

account = false



/datatest2:

dev = /dev/datalv2

vfs = jfs2

log = /dev/loglv03

mount = true

check = false

options = rw

account = false



/logtest2:

dev = /dev/loglv2

vfs = jfs2

log = /dev/loglv04

mount = true

check = false

options = rw

account = false

bash-3.00#



Notice the cloned duplicates are prefixed with /fs on the mount point by the recreatevg command. Also the volume names were changed to prevent duplicate entries in /dev. Update /etc/filesysems with the mount points created previously.



bash-3.00# mount /dataclone1test2

Replaying log for /dev/fsdatalv2.

bash-3.00# mount /logclone1test2

Replaying log for /dev/fsloglv2.

bash-3.00# df

Filesystem 512-blocks Free %Used Iused %Iused Mounted on

/dev/hd4 1048576 594248 44% 13064 17% /

/dev/hd2 20971520 5376744 75% 49070 8% /usr

/dev/hd9var 2097152 688232 68% 11373 13% /var

/dev/hd3 2097152 1919664 9% 455 1% /tmp

/dev/hd1 1048576 42032 96% 631 12% /home

/dev/hd11admin 524288 523488 1% 5 1% /admin

/proc - - - - - /proc

/dev/hd10opt 4194304 3453936 18% 9152 3% /opt

/dev/livedump 524288 523552 1% 4 1% /var/adm/ras/livedump

/dev/pocdbbacklv 626524160 578596720 8% 8 1% /proddbback

/dev/fspoclv 1254359040 1033501496 18% 2064 1% /cl3data

/dev/fspocdbloglv 206438400 193491536 7% 110 1% /cl3logs

/dev/poclv 1254359040 1033501480 18% 2064 1% /proddb

/dev/pocdbloglv 206438400 193158824 7% 115 1% /proddblog

/dev/datalv2 836239360 615477152 27% 2064 1% /datatest2

/dev/loglv2 208404480 195088848 7% 118 1% /logtest2

/dev/fsdatalv2 836239360 615477160 27% 2064 1% /dataclone1test2

/dev/fsloglv2 208404480 195744288 7% 114 1% /logclone1test2

bash-3.00#

Share/Save/Bookmark

Monday, August 15, 2011

Virtual Machine Migration Fails

After recently upgrading a customer to vSphere 4.1 Update 1, I received a call from the customer because one of their guest VMs could not be migrated to a different VMFS datastore.  The somewhat troubling, “unable to access file” error message indicated a problem with the VMDK for a specific snapshot.


vsphere-migrate-unable-to-access-file


Through an SSH session I validated the snapshot VMDK and VMDK flat file existed in the proper directory.  The VMDK flat file had a non-zero size, suggesting to me this might be a configuration problem.


Out of curiosity I examined the snapshot’s VMDK configuration file and noticed the parentFileNameHint entry contained the full pathname and used the VMFS GUID value.  Hmm.  Is that the proper GUID?  No it wasn’t.


Since the VM had other snapshots I reviewed those configuration files as well and noticed they used relative path names for the parentFileNameHint.  Could it be that simple?
I edited the snapshot VMDK configuration file and removed the full path qualification.  


Problem solved.


In my example above (which was reproduced in my lab), I changed:
parentFileNameHint=”/vmfs/volumes/4ab3ebbc-46f3c941-7c14-00144fe69d58/retro-000001.vmdk"
To:
parentFileNameHint="retro-000001.vmdk"
Share/Save/Bookmark

Monday, August 8, 2011

FCIP considerations for 10 GigE on Brocade FX8-24

While working with a client to architect a new FCIP solution, there were a number of considerations that needed to be addressed. With this particular implementation we are leveraging the FX8-24 blades in DCX chassis and attaching the 10Gbe xge links to the network core.

With the current version of FOS (6.4.1a in this case), the 10Gbe interface is maximized by using multiple “circuits” which are combined in a single FCIP tunnel. Each circuit has a maximum bandwidth of 1Gb. To aggregate the multiple circuits you need the Advanced Extension License. Each of these circuits needs an IP address on either end of the tunnel. Additionally, there are two 10Gbe xge ports on each FX8-24 card; and they require placement in separate VLANs. Be sure to discuss and plan these requirements with your network team.

There are other considerations as well, such as utilizing Virtual Fabrics to isolate and allow the FCIP fabrics to merge between switches or utilizing the Integrated Routing feature (additional licensing) to configure the FCIP tunnels with VEX ports without the requirement of merging the fabrics.

Regardless of the architecture (Virtual fabric vs integrated routing) you will need to configure the 1Gbe circuits appropriately. You will want to understand the maximum bandwidth your link can sustain, and configure the FCIP tunnel in such a way that you consume just under the maximum bandwidth to prevent TCP/IP congestion and sliding window ramp up from slowing down your overall throughput.

In our example, we want to consume about 6 Gbe of bandwidth between their locations. We will need to configure six circuits within the FCIP tunnel, each configured just under the 1Gbe maximum bandwidth setting.

Share/Save/Bookmark

Monday, August 1, 2011

Storage Performance Concepts Entry 6

The Impact of Cache
Although the cache within an array can be used for a number of purposes such as storing configuration information or tables for snapshots and replication its primary purpose is to serve as a high speed buffer area between the hosts and the backend disk. The caching algorithms used by the manufacturers are their “secret sauce” and they aren’t always forthcoming with exactly how they work but at a basic level they all provide the same types of functions.
In our previous entries on storage performance we highlighted the performance differences of RAID 10 versus RAID 5. RAID 10 provides better performance than RAID 5 in particular for random IO because the RAID penalty is lower and given the same amount of usable capacity it will have more underlying RAID Groups and total disk drives.
Despite the performance benefits we deploy RAID 5 much more frequently than RAID 10. The first reason why is obvious, RAID 5 is much less expensive. The second reason is that cache greatly improves performance to the point that RAID 5 is acceptable.
All host write requests are satisfied by cache. Once the IO is stored in cache the array sends an acknowledgement back to the host that initiated the IO to let it know that it was received successfully and that it can now send its next IO request. In the background the array will write or flush the pending write IOs to disk.
How and when these write IOs are flushed depends on the cache management algorithms of the array as well as the current cache workload. Most arrays including the HDS AMS 2000 will attempt to hold off flushing to disk to see if it can perform a full stripe write. This is particularly important in RAID 5 configurations because it means you can eliminate the read and modify operations associated with performing write IO. Details of Read-Modify-Write can be found at the following link.
Although the array is not always able to do full stripe writes at least on the HDS arrays we are often able to use RAID 5 in scenarios where you would typically be looking at RAID 10. This is not to say that RAID 5 is faster than RAID 10 just that with effective cache algorithms RAID 5 may perform better than expected.
It’s interesting that if you google array cache performance you will find a number of articles and blog entries that talk about how cache can negatively impact performance. This is a bit misleading. These entries are focused on very specific use cases and typically involve a single application with a dedicated storage device rather than a shared storage environment like the ones used in most environments. Let’s breakdown how cache is used and the impact associated with various workloads.
How Cache Works
Exactly how cache operates is dependent on the storage array and the particular step by step processes involved are outside the scope of this entry. In any case, what we are interested in is not so much the cache operations of a specific array but rather the general impact of cache in real world workloads. I cannot speak for every array but in the case of HDS enterprise systems their Theory of Operation Guide goes into excruciating detail on cache. If you are looking for this type of detail I suggest you get a copy of this guide or similar information from your array manufacturer.
The performance benefits of cache are based on cache hits. A cache hit occurs when the IO request can be satisfied from cache rather than from disk. The array manufacturers develop algorithms to make the best use of the available cache and to increase the number of cache hits. This is done in a number of ways.
Least Recently Used – Most arrays employ a least recently used algorithm to manage the content of cache. In short the most recent data is left in cache and the least recently used data is removed, freeing up capacity for additional IO.
Locality of Reference – When a read request is issued the data is read from disk and placed into cache. In addition to the specific data requested, additional data said to be nearby is also loaded into cache. This is based on the principal that data exists in clusters and that there is a high degree of likelihood that nearby data will also be accessed.
Prefetching – The array will attempt to recognize sequential read operations and prefetch data into cache.
Applying this to workloads
As we discussed in our previous entry Cache can be thought of as very wide but not very deep, it provides a lot of IOs but doesn’t store a lot of data. This can come into play when you try and determine how to address a performance issues with a storage array. Here is an illustration that should help.
You have an array with 16GB of cache that you are frequently seeing performance issues. You look at performance monitor and determine that the cache has entered write pending mode, meaning that 70% of the user cache has pending write data and the array has begun throttling back the attached hosts.
Should you add more cache? Let’s take a look.
· During the performance degradation the array is receiving 400MBps of write requests
· The array has 16GB of cache and we are considering upgrading it to 32GB.
o 16GB of installed cache results in
§ 8GB Mirrored
§ 70% limit = 5.734GB
o 32GB of installed cache results in
§ 16GB Mirrored
§ 70% Limit = 11.2GB Usable
o Writing data at 400MBps fills up the 16GB configuration in ~15 Seconds
o Writing data at 400MBps fills up the 32GB configuration in ~29 Seconds
Assuming you pay around $1,000 per GB of cache it will cost you $16,000 for the array to slow down to a crawl in 29 seconds rather than 15. This is probably not the solution you or management is hoping for. What may be needed is a combination of approaches.
1. You may still need cache, if the bursts are not sustained the extra cache may be good enough.
2. You may need to partition the cache so that the offending application doesn’t consume all available cache and impact the other hosts sharing the array. The way this works is you dedicate a portion of cache to the LUNs that are receiving the heavy IO, leaving the rest of the cache available for other applications. This will not speed up the application that is performing all of the writes but it will at least keep all of the other applications from suffering.
3. You may need to speed up the disk – the bottom of the funnel in our previous diagram. This can be done by using something like Dynamic Provisioning to distribute the IO across more disks. If Dynamic Provisioning is already in use you may need to add more disk to the pool, each new disk provides additional IOPS.
You may need a combination of all three approaches; more cache, Dynamic Provisioning and more disks. The answer depends on the workload. Cache is an advantage primarily when the IO profile consists of short bursts of random IO. This is most common in OLTP environments. Cache does not help and may even hinder performance in the cases of long sequential IO.

Share/Save/Bookmark