Data Reduction Deep Dive: Is Thin Provisioning Data Reduction?

Modern all-flash arrays now, as a necessity, need to feature data efficiency technologies such as deduplication, compression, and thin provisioning to help manage-down the cost of storage in a world of constant storage growth. These new technologies are a boon for the customer in terms of saving money, but add some complexity to the process of planning for storage purchases and managing growth. How much storage do I need to buy to store “x” amount of data – today and tomorrow?

To add to the confusion – vendors define data efficiency technologies differently, and often try to be very broad in their definitions, claims, and measuring techniques around data reduction to show “the best” data reduction. While this might be helpful for marketing – it isn’t always helpful for real capacity planning.

In this blog post we aim to explain the different forms of efficiency technologies, and discuss how best to use them to plan for your storage capacity needs, as well as compare results between vendors in PoC testing.

Data Reduction and Capacity Efficiency: What’s the difference?

Before diving deep into Thin Provisioning, it’s important to recognize that there are two types of efficiency technologies: capacity efficiency and data reduction.

Capacity efficiency technologies either avoid or free the “unused” capacity within a storage volume so that it is available for use by other volumes, thereby increasing storage efficiency. Examples of capacity efficiency technologies include Thin Provisioning, Zero Detection, and Unmap.

Data reduction technologies reduce the actual size of the data. For example, 10TBs of data may be reduced down to 2TBs with 5-to-1 data reduction. Examples of data reduction technologies include Deduplication, Compression, Pattern Removal, and Copy Reduction (for clones and xCopy commands, not snapshots). Data reduction results in two important benefits: it allows you to address your usable capacity needs with a smaller amount of raw capacity, thereby lowering the $/GB usable, and it maximizes the lifespan of flash by reducing the write IOs to flash. Note: While data reduction technologies had some limited applicability to traditional disk storage arrays, they suffered from poor performance and thus were targeted towards Tier 2 workloads. The advent of all-flash arrays with much higher flash performance and purpose-built designs enabled data reduction to be brought into Tier 1 workloads with consistent and predictable performance.

Thin Provisioning explained

Over a decade ago, the only supported storage provisioning approach was Thick Provisioning. When you created a storage volume, 100% of the usable capacity that was needed to support the volume size was reserved immediately by the storage array for use by that storage volume. Creation of a 500GB volume would result in 500GB of usable capacity being reserved. Yet, applications wrote to storage over time – leading to wasted capacity that was reserved upfront and remained unused for years.

TP_explained

Complexity of storage provisioning on legacy disk storage fueled the capacity waste. Storage provisioning time was measured in weeks to months, incentivizing the application owners to add a buffer to their storage needs. Similarly, storage administrators preferred adding their own buffer to the volume size to minimize storage provisioning frequency. End result: more capacity was reserved upfront and remained unused, which resulted in a large amount of wasted capacity.

The emergence of Thin Provisioning provided a new approach to storage provisioning – one that could eliminate the wasted capacity. Thin Provisioning reserves capacity dynamically to keep slightly ahead of the written data. By doing so, Thin Provisioning gets rid of the upfront capacity reservation, thereby unlocking and freeing the capacity that would otherwise be reserved upfront and trapped within Thick Provisioned volumes. In aggregate, this led to significant efficiency gains for typical Thick Provisioned storage deployments.

Thin Provisioning also meant that storage administrators could create large storage volumes freely – without the upfront capacity reservation waste – enabling storage administrators to provision storage volumes tailored for the application’s lifetime, thereby avoiding the complexity on legacy storage of growing or adding more storage volumes to support application growth.

Over the years, Thin Provisioning has become the mainstream approach to storage provisioning, and the majority of storage deployments today leverage Thin Provisioning – many being close to 100% Thin Provisioned.

Thin Provisioning: An enabler for staying thin

In addition to avoiding the upfront, static reservation of capacity, Thin Provisioning is the foundation that enables your storage environment to stay thin. Applications, file systems, and virtual machines delete data over time. Typically, the block storage array is unaware of these deletions and the associated capacity stays trapped within the volume – leaving pockets of stranded capacity. The combination of a Thin Provisioned volume and support for ‘Unmap’ capability allows this stranded capacity to be released and made available for reuse by other volumes, keeping you thin over time.

A special case of Thin Provisioning enabling your storage environment to stay thin is the volume initialization scenario. VMware’s EagerZeroThick option for virtual machine creation initializes the vmdk’s for the virtual machine by writing zeroes. A Thin Provisioned volume combined with zero detection – capability on the storage array can recognize this operation, update the volume metadata such that zeroes are returned upon a read request, and avoid reserving the back-end usable capacity. Note, zero detection is the ability within a storage array to detect a Write operation that is all zeroes.

Should Thin Provisioning savings be included in data reduction ratio?

By now it should be obvious that Thin Provisioning delivers capacity efficiency, not data reduction. While the volume size can vary, the data size remains the same. Including Thin Provisioning in data reduction savings can arbitrarily increase the data reduction ratio.

Consider the following simple example:

Even though the written data remains at 100GB, changes in the volume size cause the data reduction ratio (with Thin Provisioning savings included) to vary from 5:1 – 20:1. Storage configurations sized with one of these data reduction ratios can be significantly under-sized in capacity, leading to capacity shortfalls.

Pure Storage differentiates between data reduction and Thin Provisioning savings. This is why the Pure FlashReduce Ticker displays ‘Average Data Reduction’ (with deduplication and compression) and ‘Average Total Reduction’ (with Thin Provisioning) separately.

Can you count on Thin Provisioning savings to reduce the capacity that you need to purchase?

It depends on whether your storage deployment is Thick or Thin Provisioned.

Thick-to-Thin

If your storage deployment is pre-dominantly Thick Provisioned, then you are likely to see gains (typically 2:1) in capacity efficiency. Migrating to a 100% Thin Provisioned storage environment will free the unused capacity trapped within the existing volumes, and therefore reduce the overall storage capacity required to store the same amount of usable data.

Thin-to-Thin

In contrast, if your storage deployment is pre-dominantly Thin Provisioned already, then going from a Thin-to-Thin environment helps you maintain the Thin Provisioning savings you have already realized, but will not help you gain much net new capacity savings. You may see incremental gains if the Thin Provisioning implementation on your existing storage was built with an inefficient design.

IMPORTANT: In a Thin-to-Thin scenario, you need to be careful that the configuration for a new storage array isn’t sized using the Thin Provisioning savings assumption of a Thick-to-Thin scenario. If you apply the typical 2:1 Thin Provisioning savings from Thick-to-Thin to a truly thin-to-thin scenario, then your new storage configuration will most certainly be under-sized, leading to capacity shortfalls.

Consider the scenario of Thin-to-Thin migration of a 100TB database environment.

  • Quoted usable capacity = 33.3TB (if true data reduction on this workload is 3:1)
  • Quoted usable capacity (with Thin Provisioning savings included) = 16.6TB (3:1 true data reduction x 2:1 Thin Provisioning = 6:1 data reduction)
  • Capacity shortfall = 16.6TB (since net new Thin Provisioning savings aren’t likely to be realized in an already Thin environment)

Variances in data reduction marketing claims add confusion

Different vendors include different data efficiency technologies in their data reduction marketing claims. Here is a summary chart that illustrates this point across several vendors:Screen Shot 2015-11-25 at 4.21.21 PM

(Updated to reflect corrections)

Given the differences, it becomes critically important for upfront sizing and ongoing capacity planning on all-flash arrays to understand the data efficiency technologies included in each vendor’s marketing claims, in order to compare apples-to-apples and avoid capacity shortfalls.

What to Look For In Thin Provisioning Claims When Shopping for an All Flash Array

Here are the actions that you can take:

  1. Know your own environment. Determine whether your existing storage is Thick or Thin Provisioned. If Thick, consider Thin Provisioning savings (about 2-to-1 capacity savings across all flash array vendors). If Thin, discount the use of Thin Provisioning savings in sizing the storage configuration.
  1. Determine how the storage quote is sized. Ask vendors if Thin Provisioning savings are included in their data reduction claims and whether the quoted storage configuration is sized assuming Thin Provisioning savings.
  1. Compare apples-to-apples. Either have all vendors size the configuration and quote with Thin Provisioning savings included or have all vendors remove Thin Provisioning savings. Avoid apples-to-oranges comparison.
  1. Avoid under-sized configurations. Ensure that the storage configuration included in the vendor’s quote is sized to meet your usable capacity needs. Understand what is likely to happen if the realized storage capacity ends up being under-sized after the array is purchased and installed.

Hopefully, this post has raised your awareness about Thin Provisioning and whether it can deliver capacity savings for you. Click here to learn about our industry-leading data reduction that reduces your spend upfront and slows future spend, enabling all-flash storage that pays for itself in about a year.

Copyright 2015, Pure Storage, Inc. All rights reserved. Pure Storage, the “P” Logo, Forever Flash and FlashArray are trademarks or registered trademarks of Pure Storage, Inc. in the U.S. and other countries. All other trademarks are property of their respective owners.