Solving disk latency: Disk or memory based caching?

CachingA couple of months ago I wrote a blog post on VMware’s CBRC and Microsoft’s CSV Cache which are both memory caching technologies. I got a couple of questions why I didn’t write about Citrix’s Intellicache.
I even got the attention of Jim Moyle and Andrew Wood, both working for Atlantis and both deeply experienced in technologies like disk based caching and in-memory caching. The technologies all try to solve the everlasting IOP challenge (or better yet latency issues which causes IOs troubles) we’re facing during our implementations of different desktop delivery solutions. That same latency is the core of our troubles. While most people are under the impression that a high number of IOs is the issue, the latency is the true issue. An IO profile includes throughput, block size, sequential vs random and read/write ratio, this normally isn’t the problem but it’s latency causing a bunch of IOs waiting for the moment to get processed.

This issue got a lot of attention from the market and thus traction from vendors. There’s a difference is the way latency (causing high numbers of IOs) gets handled by vendors.

Caching is one of technologies used to prevent high latency, we create a buffer on a platform that transparently stores data so that future requests for that data can be served faster. If requested data is contained in the cache, this request can be served by simply reading the cache while improving performance and decreasing load on the backend systems.

Basically there are two ways to handles this: disk based caching and in-memory caching.

What is Disk Cache:

A disk cache is a mechanism for improving the time it takes to read from or write to a hard disk. Today, the disk cache is usually included as part of the hard disk. A disk cache can also be a specified portion of random access memory (RAM). The disk cache holds data that has recently been read and, in some cases, adjacent data areas that are likely to be accessed next. Write caching is also provided with some disk caches.

What is Memory Cache:

A more generic cache using RAM set aside as a specialized buffer storage that is continually updated; it is used to optimize data transfers between system elements with different characteristics. 

While I was looking at the possible solutions for disk based caching and in memory caching I found a couple of them. I know this isn’t a complete list but in my defense: this market is filled with startups and expands rapidly.

Disk based Caching solutions
  • Intellicache (Hypervisor integrated)
  • vFRC  (Hypervisor integrated)
Memory based Caching solutions
  • vSphere CBRC  (Hypervisor integrated)
  • Hyper-V CSV Cache  (Hypervisor integrated)
Vendor storage optimisation solutions using caching
  • GreenBytes (3rd Party)
  • Atlantis ILIO (3rd Party)
  • Liquidware FlexIO (3rd Party)
  • PernixData (3rd Party)
  • Infinio (3rd Party)

There isn’t a good or bad solution. It’s very important to realize that both methods have their pro’s and con’s and especially the solutions that aren’t hypervisor integrated can (or should?) bring added value. While optimizing reads is fairly easy (by adding SSD for example), the real hard nut to crack is optimizing writes. VDI environments are write intensive workloads. While SSD drives are great for reads, they can suffer from reduced performance under intensive write workloads. Of course most (maybe all) storage vendors use NAND first (blazing fast) and SSD’s after that but when you’re experiencing intensive write workloads you could look at a more complete solution and that’s the part where 3rd party vendors rock the stage.

Example: Citrix IntelliCache

When looking at IntelliCache each desktop VM is able to write its own Write Cache on the host, reducing writes to the SAN/NAS. As a result of caching on local storage, when IntelliCache is configured for a pooled desktop, it significantly reduces the load on the remote storage and the amount of network traffic.

The VMs cannot benefit from the Read Cache immediately since it is not fully populated. Instead, XenServer populates the Read Cache progressively each time a desktop VM requests a specific block of operating system data. When the first desktop VM is powered on and XenServer creates the Read Cache in the local SR, the cache is empty and needs to be filled. A XenServer host caches blocks of the master image in its Read Cache each time its desktop VMs read data from the master image. When subsequent desktop VMs boot, they will read the already cached blocks and will not need to access the data from shared storage.

Example: Atlantis ILIO

For example Atlantis ILIO isn’t just another caching technology but will handle your reads and writes in a more intelligent way your hypervisor can. It will do inline deduplication of your IOPS while writing larger sequential blocks. Meaning: it will drastically decrease your reads and writes and will make it easier to write larger blocks and enhancing your performance. Pascal published a great blogpost on this topic in his post on Atlantis ILIO – RAM Based storage matched for VDI.

One of the other aspects of selecting a 3rd party vendor could be that you’re not bound to a certain hypervisor as a 3rd party vendor probably is going to support multiple hypervisors leaving some room for future changes, without vendor lockin based on features only released for a certain platform.

Measurement of IOs:

While you would think measuring IOs would be fairly easy it actually is pretty hard. Some great community guys already elaborated on that topic, my need-to-read list is:

Measurement of latency:

As stated the measurement of IOs is hard but latency can be measured from Windows based on PerfMon stats.

Disk or memory based caching – be aware of the possibilities

My point of this article? Be aware of the possibilities. We have to solve the high latency issues on our storage platform, but choose wisely, Not all solutions are the same. Make sure you get a hold of a couple of them (as most of them are software it’s easy try some of them in a PoC) and do a tech bake off  (thanks Jarian and Shane!) to make sure you pick the right solution. Base your decision  on proper measurements, that’s where the Need to Read list comes in.

The following two tabs change content below.

Kees Baggerman

Kees Baggerman is a Staff Solutions Architect for End User Computing at Nutanix. Kees has driven numerous Microsoft and Citrix, and RES infrastructures functional/technical designs, migrations, implementations engagements over the years.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.