Monday, August 1, 2011

Storage Performance Concepts Entry 6

The Impact of Cache
Although the cache within an array can be used for a number of purposes such as storing configuration information or tables for snapshots and replication its primary purpose is to serve as a high speed buffer area between the hosts and the backend disk. The caching algorithms used by the manufacturers are their “secret sauce” and they aren’t always forthcoming with exactly how they work but at a basic level they all provide the same types of functions.
In our previous entries on storage performance we highlighted the performance differences of RAID 10 versus RAID 5. RAID 10 provides better performance than RAID 5 in particular for random IO because the RAID penalty is lower and given the same amount of usable capacity it will have more underlying RAID Groups and total disk drives.
Despite the performance benefits we deploy RAID 5 much more frequently than RAID 10. The first reason why is obvious, RAID 5 is much less expensive. The second reason is that cache greatly improves performance to the point that RAID 5 is acceptable.
All host write requests are satisfied by cache. Once the IO is stored in cache the array sends an acknowledgement back to the host that initiated the IO to let it know that it was received successfully and that it can now send its next IO request. In the background the array will write or flush the pending write IOs to disk.
How and when these write IOs are flushed depends on the cache management algorithms of the array as well as the current cache workload. Most arrays including the HDS AMS 2000 will attempt to hold off flushing to disk to see if it can perform a full stripe write. This is particularly important in RAID 5 configurations because it means you can eliminate the read and modify operations associated with performing write IO. Details of Read-Modify-Write can be found at the following link.
Although the array is not always able to do full stripe writes at least on the HDS arrays we are often able to use RAID 5 in scenarios where you would typically be looking at RAID 10. This is not to say that RAID 5 is faster than RAID 10 just that with effective cache algorithms RAID 5 may perform better than expected.
It’s interesting that if you google array cache performance you will find a number of articles and blog entries that talk about how cache can negatively impact performance. This is a bit misleading. These entries are focused on very specific use cases and typically involve a single application with a dedicated storage device rather than a shared storage environment like the ones used in most environments. Let’s breakdown how cache is used and the impact associated with various workloads.
How Cache Works
Exactly how cache operates is dependent on the storage array and the particular step by step processes involved are outside the scope of this entry. In any case, what we are interested in is not so much the cache operations of a specific array but rather the general impact of cache in real world workloads. I cannot speak for every array but in the case of HDS enterprise systems their Theory of Operation Guide goes into excruciating detail on cache. If you are looking for this type of detail I suggest you get a copy of this guide or similar information from your array manufacturer.
The performance benefits of cache are based on cache hits. A cache hit occurs when the IO request can be satisfied from cache rather than from disk. The array manufacturers develop algorithms to make the best use of the available cache and to increase the number of cache hits. This is done in a number of ways.
Least Recently Used – Most arrays employ a least recently used algorithm to manage the content of cache. In short the most recent data is left in cache and the least recently used data is removed, freeing up capacity for additional IO.
Locality of Reference – When a read request is issued the data is read from disk and placed into cache. In addition to the specific data requested, additional data said to be nearby is also loaded into cache. This is based on the principal that data exists in clusters and that there is a high degree of likelihood that nearby data will also be accessed.
Prefetching – The array will attempt to recognize sequential read operations and prefetch data into cache.
Applying this to workloads
As we discussed in our previous entry Cache can be thought of as very wide but not very deep, it provides a lot of IOs but doesn’t store a lot of data. This can come into play when you try and determine how to address a performance issues with a storage array. Here is an illustration that should help.
You have an array with 16GB of cache that you are frequently seeing performance issues. You look at performance monitor and determine that the cache has entered write pending mode, meaning that 70% of the user cache has pending write data and the array has begun throttling back the attached hosts.
Should you add more cache? Let’s take a look.
· During the performance degradation the array is receiving 400MBps of write requests
· The array has 16GB of cache and we are considering upgrading it to 32GB.
o 16GB of installed cache results in
§ 8GB Mirrored
§ 70% limit = 5.734GB
o 32GB of installed cache results in
§ 16GB Mirrored
§ 70% Limit = 11.2GB Usable
o Writing data at 400MBps fills up the 16GB configuration in ~15 Seconds
o Writing data at 400MBps fills up the 32GB configuration in ~29 Seconds
Assuming you pay around $1,000 per GB of cache it will cost you $16,000 for the array to slow down to a crawl in 29 seconds rather than 15. This is probably not the solution you or management is hoping for. What may be needed is a combination of approaches.
1. You may still need cache, if the bursts are not sustained the extra cache may be good enough.
2. You may need to partition the cache so that the offending application doesn’t consume all available cache and impact the other hosts sharing the array. The way this works is you dedicate a portion of cache to the LUNs that are receiving the heavy IO, leaving the rest of the cache available for other applications. This will not speed up the application that is performing all of the writes but it will at least keep all of the other applications from suffering.
3. You may need to speed up the disk – the bottom of the funnel in our previous diagram. This can be done by using something like Dynamic Provisioning to distribute the IO across more disks. If Dynamic Provisioning is already in use you may need to add more disk to the pool, each new disk provides additional IOPS.
You may need a combination of all three approaches; more cache, Dynamic Provisioning and more disks. The answer depends on the workload. Cache is an advantage primarily when the IO profile consists of short bursts of random IO. This is most common in OLTP environments. Cache does not help and may even hinder performance in the cases of long sequential IO.


1 comment:

  1. This is a good continuation of the cache architecture post.

    Thanks again.