Part 1: Deduplication and Compression with SQL Server Databases

Gain insights into our approach to deduplication and compression in Pure arrays as we clarify some of the uncertainty and doubt that such technologies generate.


Can Deduplication and Compression Cause Data Loss?

I will address this concern by stating that our deduplication (or dedupe, for short) and compression algorithms are lossless, which means that the original data can be perfectly reconstructed, every single time, from its compressed form. What SQL Server (through Windows) writes to the FlashArray is exactly what SQL Server will read from the FlashArray. There is no compromise in data integrity from having dedupe and compression enabled in our arrays. You know how row and page compression, and columnstore indexes in SQL Server compress your data without compromising its integrity? Yeah, it’s kinda like that.

Furthermore, when you issue a write to the array, we won’t acknowledge that write back to the OS until it has been written to two separate super-fast NVRAM devices. We will then do more things with that data, like moving it to the SSDs, but at that point the data has been made safely redundant.

Can I Turn This Stuff Off?

If you’re wondering whether you can turn off dedupe and compression at all, the answer is a most resounding: No, you cannot turn them off.

We do this for a reason (among others): to minimize write amplification on our SSDs, and extend the life of your investment in Pure.

How Does it Work?

These are words that my fellow database engineers at my previous job used to describe how the FlashArray works after our initial meetings with the Pure Storage folks.

Well, let me remind you of Arthur C. Clarke’s famous quote: “Any sufficiently advanced technology is indistinguishable from magic.”

That is how I felt about the array once I saw it perform. Inline dedupe and compression. Terabytes of data reduced by a factor of 3.5:1. Already compressed data being compressed even further. Handling an 11TB, OLTP database with ease. Yep, magic!

Today there are many hundreds of SQL Server instances and thousands of databases running happily on Pure. DBAs who sleep better because they don’t have to worry about storage. Businesses that can do more because they have freed themselves from the chains of slow IO. And many, many, happy, happy customers.

And if you don’t think Pure can handle your mission-critical SQL Server databases, think again: that’s exactly what it did at my previous job. And trust me on this one, I wouldn’t have joined this company if I didn’t believe in its product.

Flash-enabled Magic

You probably think that it’s not a very good idea to have a disk-based device performing dedupe and compression on 100% of the writes to host your SQL Server databases. And I would agree with you on that. What allows us to do this in our arrays is flash. Flash that can perform operations in nanoseconds, and even in parallel. Our Purity Operating Environment architecture enables us to leverage that extremely low latency and parallelism to handle incoming and outgoing data extremely fast, and with high redundancy.

Warning! Highly Geeky Stuff Follows: How it Actually Works

If you feel like going down the rabbit hole and learn more about our FlashArrays and the Purity Operating Environment at a much deeper level, I can point you to this Storage Field Day 6 whiteboard session with one of our Principal Architects, Neil Vachharajani. It is a 56-minute long video, packed with tons of details on how we handle data inside our arrays.

Read the next blog post to see how and why dedupe and compression in our FlashArrays can compress your already compressed heaps and indexes (i.e., tables) on your SQL Server databases.

-A