Monday, October 4, 2010

Optimizing Oracle (Sun) MPxIO for sequential workloads

If you’ve been working with storage for a long time then you probably know that most vendors include optimizations for sequential I/O in their microcode. Basically the array looks for some amount of sequential access, and then changes its caching behavior.

This is necessary for two reasons. First, when you’re doing lots of sequential I/O the cache mostly just gets in the way. (Buffering the read/modify/write penalty for RAID-5 is an exception, but that’s a blog for another time). Second, and perhaps most importantly, it doesn’t take long for sustained sequential I/O to overrun cache.

As a theoretical example, assume you have an array with 512 GB of cache. This theoretical array most likely mirrors cache to avoid outages, so that’s really only 256 GB. Sounds like a lot, right?

Now consider a single host with an 8 Gbit/s HBA in a 16x PCIe slot. Assume that this host is doing D2D backups, and assume that it’s writing at 600 MB/s. How long will it take to write 256 GB of data?

The answer? About 7 minutes. Granted data will be destaged to disk while this is going on, and if the disks can keep up you won’t fill cache, but if they can’t keep up then you’re going to have a problem.

So far, so good, right? Overly caching sequential I/O is bad, so the array recognizes sequential I/O and changes its behavior. Except – what happens when you add multipathing to the picture?

To steal a phrase from Denis Leary, “Bad things, man, bad things.” Or, to be more accurate, it depends, but by default usually bad things. Because the (formerly sequential) I/O gets spread across the different ports from the array it no longer looks sequential – and the array doesn’t change its caching behavior.

There are multiple ways to resolve this issue – for some multipathing solutions you can choose a different load-balancing algorithm. For others, you can set flags. If you are using the vendor-supplied solution with their array (e.g., EMC’s PowerPath or HDS’ HDLM) then the right thing should happen by default. With Symantec (VERITAS) Storage Foundation (Volume Manager) you can either install the correct ASL (Array Support Library), use the partitionsize parameter in concert with the vxdmpadm command, or, if you want to be really old school set dmp_pathswitch_blks_shift in /kernel/drv/vxdmp.conf.

Recently we were called in to look at a performance issue for a customer using Solaris’ built-in multipathing, MPxIO. During the analysis, we wondered if there was a way to optimize MPXIO for sequential workloads. It turns out that there is, although it’s not well documented.

In version 4.4 of the SAN kit, Sun added a new load balancing option, called logical-block. This option takes a single parameter, region-size. The region-size parameter is used as a power of 2 to logically divide the multipathed device into regions. Any I/O to a given region will always go down the same path.

As an example, the default region-size is 18. 2 raised to the 18th power is 262144. This number is in 512-byte blocks, so divide by 2 to get kbytes and then by 1024 to get megabytes and you have a region size of 128 MB. If we have two paths to this device then any access to the first 128 MB will go down the first path, any access to the second 128 MB will go down the second path, any access to the third 128 MB will go down the third path, and so on.

While the effect of this parameter is entirely workload dependent, in the pathological case Sun says that a performance improvement of 700% is possible.

So, how do you set this parameter? In the /kernel/drv/scsi_vhci.conf, of course. Here's a sample for an HDS array using OPEN-V devices with the host mode option set to Sun (09), as well as an AMS2000:

# Copyright 2004 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#pragma ident "@(#)scsi_vhci.conf 1.9 04/08/26 SMI"
name="scsi_vhci" class="root";
# Load balancing global configuration: setting load-balance="none" will cause
# all I/O to a given device (which supports multipath I/O) to occur via one
# path. Setting load-balance="round-robin" will cause each path to the device
# to be used in turn.
# Automatic failback configuration
# possible values are auto-failback="enable" or auto-failback="disable"
# For enabling MPxIO support for 3rd party symmetric device need an
# entry similar to following in this file. Just replace the "SUN SENA"
# part with the Vendor ID/Product ID for the device, exactly as reported by
# Inquiry cmd.
# device-type-scsi-options-list =
# "SUN SENA", "symmetric-option";
# symmetric-option = 0x1000000;
device-type-mpxio-options-list =
"device-type=HITACHI OPEN-V -SUN","load-balance-options=logical-block-options",
"device-type=HITACHI DF600F", "load-balance-options=logical-block-options";
logical-block-options="load-balance=logical-block", "region-size=16";

In this example both arrays are configured so that I/Os within a 32 megabyte range will always use the same path.  You can confirm that the logical-block load balancing algorithm is in use by inspecting the output from dmesg for the following:

Sep  9 08:50:50 t6320-1 genunix: [ID 834635] /scsi_vhci/ssd@g60060e80102ae5800511a1e0000000c8 (ssd23) multipath status: optimal, path /pci@0/pci@0/pci@8/SUNW,qlc@0/fp@0,0 (fp2) to target address: w50060e80102ae580,0 is online Load balancing: logical-block, region-size: 16

More information on this parameter is available on SunSolve, just search for logical-block.


No comments:

Post a Comment