Data Reduction Efficiency: All storage is NOT created equal!

When Pure Storage set out to create FlashArray, a primary objective was how to bring the higher Tier-1 performance that all-flash technology offered over conventional disk-based systems to the masses, but at prices that were comparable to conventional Tier-2 disk systems.

As an all-flash array from the very beginning, FlashArray has delivered Tier-1 all-flash performance, higher availability, power savings, weight savings, and simplicity. Multiple data reduction technologies and approaches enable lower cost, but do it in ways that provide additional benefits, even as other all-flash arrays entered the market.

We believe that there are clear differences between how different storage products have implemented data reduction, and that these differences significantly impact cost, space, power, and complexity.

Although this blog is over 4 years old, it still provides an excellent technical foundation to explain both our objectives and implementation for data reduction in a bit more detail. But here’s the quick version:

Data Reduction at Pure is designed to be “always on”, all the time. FlashArray reduces data in five different ways:

  1. Pattern Removal identifies and removes repetitive binary patterns, including zeroes. In addition to capacity savings, pattern removal reduces the volume of data to be processed by the dedupe scanner and compression engine.
  2. High-performance, inline deduplication operates on a very-granular 512-byte aligned, variable block size ranging from 4-32KB. Only unique blocks of data are saved on flash – removing duplicates that fixed-block architectures might miss.
  3. Inline Compression reduces data to use less capacity than the original format. Append-only write layout and variable addressing optimize compression savings by removing the wasted space that fixed-block architectures introduce.
  4. Post-Compression provides additional, heavier-weight compression algorithms that are applied post-process that increase the efficiency savings on data that was already compressed.
  5. Copy Reduction. Copying data on a FlashArray only involves metadata. Leveraging the data reduction engine, Purity provides instant pre-deduplicated copies of data for snapshots, clones, replication, and xCopy commands.  This has interesting new implications, like instant and space saving migration from VMFS VMs to new VVOLs in VMware!

As a storage vendor, if you believe that your product has leadership data reduction, you should not be afraid to share the actual real customer data publicly. For FlashArray, we do this in real-time, continuously live, and updated every three seconds for all the world to see. You can see the current status right now, here, showing what well over 10,000 arrays are achieving at the moment. The image below is what it looked like as I was typing this on Friday, July 27th, at 3:35 PM EDT.

Of course, it helps if you have a real-time cloud-based support and management infrastructure like Pure1® that facilitates being able to easily capture and share the data.

While “hero numbers” like these are real, it is important to note that these numbers represent the numerical average across every FlashArray installed. The actual data reduction ratio (DRR) that you will experience can vary greatly by application, environment, and use case. Some environments may not experience data reduction this good, a few may not experience ANY significant data reduction at all, while others may be well larger than the numbers above.

Here is a mapping of what FlashArray customers have seen in the past for data reduction from just compression and deduplication across different use cases, with the line on the chart showing the relative number of installed customer arrays that fall into the individual categories.

If a vendor does not share actual data reduction data publicly, you already know that it’s probably because of one or both of two possible reasons:

  1. They don’t have the tools, infrastructure, or means to obtain or present the data, or
  2. They don’t like what the data shows and don’t really want you to see it.

The data above, as good as it is, does not yet fully reflect the expected impact of the newest version 5.1 of the Purity operating system that has just recently become available. Most FlashArray customers will see up to a 20% improvement in efficiency on their existing systems when they non-disruptively upgrade due to improvements in compression, although the improvements will come gradually as the array does “garbage collection” of old data segments.

We welcome the opportunity to examine your application and data environment, so that we can give you an informed and credible estimate of what YOU can expect for data reduction from Pure, based upon your real data. We can then not only guarantee the right size for your configuration, but we can even guarantee that you will love every aspect of your Pure Storage.

We believe that each storage vendor has different degrees of technology, experience, and breadth of data reduction technologies, with the result being varying degrees of business, operational, and financial impact for you, depending upon which ones you choose.

In 2016 Dell-EMC released a new 5977 version of the HyperMax OS for VMAX all-flash systems that for the first time supported data reduction in the form of inline compression. This was after EMC initially started talking about it in 2009 as part of their “small” phase for FAST-VP which was positioned to include both compression and deduplication. The average expected data reduction was positioned as being about “2:1”, and recently Dell has stated that their actual achieved data reduction has come very close to that goal, at 1.98:1, mentioned in this video at the 22:20 mark.

With the new PowerMax, some things have changed. The previous Compression I/O Module has been replaced with a Data Reduction I/O Module that runs a different inline compression algorithm. It has also added the ability to do additional post-ingest compression on data that has been inactive for 30 days. And for the first time on this architecture – 8 years after EMC first mentioned it – deduplication has also been added. So, it is reasonable to expect that data reduction on PowerMax should be superior to data reduction on VMAX all-flash for most customers.

It is impossible for the public to know with certainty what kind of data reduction ratios customers will experience on PowerMax until there are identifiable PowerMax systems running applications in production, and customers that are willing to share their stories. Until that happens, Dell EMC has been stating what they believe customers can expect for data reduction.

From the Dell Technologies World press release: “up to 5:1 data reduction”.

The Dell EMC PowerMax Family Overview claims: “industry leading data reduction ratio of 4:1 with negligible performance impact.”

The Dell EMC PowerMax Family Data Sheet claims: “up to 5:1 data reduction (3:1 average), …inline deduplication and compression have virtually zero impact on performance”

The Dell EMC document Top Ten Reasons Why Customers Choose Dell EMC PowerMax with NVMe states: “up to 4:1 storage efficiency—guaranteed by Dell EMC. Plus dedupe and compression have virtually no performance impact…No compromises here.”

A Dell Engineering Technical White Paper claims: “an average savings of 3:1.”

Let’s examine the performance impact of using data reduction on PowerMax. As noted above, it has been described by Dell as “negligible performance impact”, “virtually no performance impact”, and “virtually zero impact on performance”.

Some questions that you might want to ask Dell about that are: If these statements are true, then why is data reduction a selectable feature rather than simply “always-on”? If data reduction does not affect performance, then why is there a classification of “active” data for which data reduction will not be applied, even if the customer enabled it and wanted it? Since “active” data can be as much as 20% of the data, won’t that increase the amount of physical storage that must be configured on PowerMax compared to an array that has  “always on” data reduction?  What about workloads with random IO, where the active and inactive data change over time?

This issue is important, as for a system that claims to have “No compromises here” isn’t that a compromise between performance and efficiency?. Do you want optimum performance or optimum efficiency? With Pure’s FlashArray, you get both. With PowerMax, hasn’t Dell chosen for you?

But back to efficiency, how does PowerMax compare for efficiency to FlashArray?

When comparing Dell’s claims about “industry leading data reduction ratio” to the proven (and public) data associated with FlashArray listed above, assuming that Dell’s claims are accurate, there’s obviously still quite a difference.

Dell’s claim of 3:1 average data reduction would compare most directly to Pure’s real-world delivered average data reduction of 5:1 (OK, it’s only 4.9577:1 as I’m writing this) as both numbers are derived from combining just compression and deduplication. Dell’s claim of up to 5:1 total data reduction, and their storage efficiency guarantee of 4:1 total data reduction, are most directly comparable to Pure’s real-world delivered total data reduction of 10:1, as this includes additional efficiency from thin provisioning. It is noteworthy that the Dell 4:1 guarantee also includes expected additional savings from snapshots.

So why is the amount of data reduction that Dell believes the PowerMax will be able to deliver a lower number than average data reduction being delivered by FlashArray?

When comparing the deduplication implementations, for example, there are technical differences between FlashArray and PowerMax that affect efficiency. From this Dell PowerMax document: “What is the granularity of dedupe? Fixed block, 128KB (track) level, which is the same as compression.“ With a 128 KB fixed block size for deduplication, PowerMax will miss a large amount of dedupe candidates that Pure will catch.

With FlashArray, the lookup data unit size is 4KB, the lookup data unit alignment is 512B, and for data comparison we use the matched 4KB data unit as an anchor point to which we then extend the match comparison before and after the anchor point in 512B increments until we find a unique segment.

So compared to PowerMax:

  • The FlashArray lookup size is 4KB vs.128KB. FlashArray should find more matches.
  • The FlashArray lookup alignment is 512B vs.128KB. FlashArray is adaptive to various real-world workloads.
  • The FlashArray lincremental match granularity is 512B vs.128KB fixed-match. FlashArray should be able to deduplicate more data.

For example, Oracle databases write 8K blocks by default and that’s what most Oracle customers use. Block size can actually be defined in Oracle ranging from 2K to 32K. But regardless of Oracle block size, Oracle writes a unique header in each block. Pure can dedupe Oracle databases easily with its 512 Byte granularity for block comparison, while PowerMax with its 128 KB granularity will not be able to dedupe any of it. Dell acknowledges this in this blog: “Keep in mind that when Oracle creates its data files, each database block (typically 8 KB in size) has a unique header, which make dedupe within a single database impossible (whether the block is empty or full). Especially given PowerMax track granularity of 128 KB.

The net result is that you would need to install much more physical storage with PowerMax than with FlashArray to store the same amount of Oracle user data.

Here is a visual representation of what it takes on both the PowerMax 8000 and FlashArray //X90 to achieve 3PBe of storage, assuming that PowerMax can achieve 3:1 data reduction, and knowing that FlashArray averages close to 5:1 data reduction.

While PowerMax is expected to be more efficient for space, power, and cooling than previous generations of VMAX, FlashArray should be even more efficient for each and every one of those criteria. The rack-unit physical size comparison above very clearly demonstrates this.

The implications for you are that with FlashArray you will require far less physical storage, and thus less power and less cooling, with added benefits of less complexity, and superior TCO.

At Pure, we would welcome the opportunity to show how we can improve your TCO, and we would also be glad to put you in touch with other customers that have already achieved these benefits that can share their stories and experience. Or even better, run a real-world Proof of Concept (POC) and see the actual difference for yourself!