Today EMC announced the long-expected GA of XtremIO. As we’ve discussed earlier, overall we view this as a great move for the industry. We’re in a rapid transformation from legacy disk architectures to all-flash architectures for performance storage, and the 800-lb gorilla in the storage industry endorsing and accelerating this transformation will only help move the entire industry forward to the bright flash future. Hugely gratifying for those of us who have been pioneering this transformation.
I’ll also give EMC props for toning-down their usual launch fanfare to provide a clean, clear, straightforward presentation of what XtremIO was all about, which made clear both what they perceived to be their technical advantages, as well as their “1.0” product limitations (many of which they were surprisingly open about in the live chat correspondence at the event).
In case you missed it, we had a little fun with a pre-game show, inviting 49ers all-star and SuperBowl champion Harris Barton to try and guess with us what the launch was all about. Take a quick watch, and then you can see how accurate our pre-game guesses were! (and check back on Monday, when we’ll publish our post-game show featuring 49ers great Ronnie Lott!).
In a nutshell, if I had to attempt to summarize what I took away from the event, EMC tried to make the following core points about XtremIO and the competition:
1) XtremIO is GA and ready for “prime time” production deployments
2) XtremIO has a very unique architecture, which differentiates it in the market:
- It is a scale-out array where performance and capacity scale linearly
- It uses an in-memory metadata model vs. persisting metadata on flash
- It does 100% inline deduplication with no post-process reduction
- It delivers consistent performance with no performance penalty for garbage collection
Let’s look at these claims one-by-one and provide the Pure perspective.
XtremIO is now GA – but is it ready?
With any product, certain compromises have to be made to ship the first GA, and it is clear after today’s launch that XtremIO is no different. Let’s just look at what these compromises entailed (presented in EMC’s words, from the launch chat window).
XtremIO GA includes:
- 10TB xBricks
- 1, 2, or 4 xBrick configurations
- FC and iSCSI support
- Inline 4K deduplication
Features marketed in the launch, but not in GA:
(These feature limitations were discussed by EMC representatives in the live chat window during the launch event. Copyright law prevents me from reproducing the actual quotes, but the launch event is available for replay if you’d like to get more context/details.)
- Dual drive loss RAID recovery: not available at GA, only a single drive failure recovery is supported
- Snapshots: not available at GA
- Scale-out cluster expansion: not available at GA. XtremIO configs are available in 1, 2, or 4-brick configurations, but expansion between them online not supported.
Finally, there were certain “GA-readiness” questions that were left unanswered by the release, including:
- HA strategy. While EMC has certainly marketed XtremIO as high availability, for such a core array feature there were precious little details presented. What does the HA process look like, from a software/architecture POV? What kind of I/O interruption does it cause and for how long? What are the performance implications of controller loss? What are the implications of complete power loss, and what does the recovery scenario look like? We’ll be looking for a more clear presentation of the HA strategy and architecture in future events….in the mean time, my only advice to customers is test, test, test.
- Non-Disruptive Upgrade / Maintenance strategy. EMC says NDU is implemented, but again, details on how it works and what performance impact it may cause are light. Chad Sakac’s blog indicates that the NDU code is included, but customer’s won’t be able to use it for the first upgrade. How are non-disruptive code upgrades executed? How is capacity expansion executed? How is controller replacement executed? Given the in-memory metadata approach of XtremIO, these will be challenging features to get right.
- Cost. EMC was completely silent on cost, both on a stand-alone basis, and in relation to VNX and VMAX. How will price compare to the all-flash VNX? What will the software licensing strategy be? Since features like snapshots are licensed on other EMC array platforms, will they have additional license cost on XtremIO? Will professional services be required for install and/or maintenance events?
OK, on to the architectural stuff….
Analyzing XtremIO’s Architectural Claims
EMC made four primary claims, which we’ll discuss 1-by-1:
1) XtremIO does 100% inline deduplication with no post-process reduction.
I believe this is true, XtremIO implements a simple 4K, SHA-1 content hash-based deduplication scheme, where all deduplication is done inline (in fact it has to be done inline, as the hash is how the array figures out where to store data). In stressing this point, EMC is poking on competitors who don’t do deduplication, and Pure Storage which leverages a variety of data reduction algorithms, most of which are inline and some of which add additional value post-process.
From my POV, what EMC highlights as a strength is one of their biggest weaknesses: their product only does 4K fixed block deduplication, and this likely is deeply tied to their 4K in-memory metadata model.
Let’s contrast this with Pure Storage. Pure implements highly adaptive data reduction technologies, we in fact use 5 independent data reduction technologies within our array, and we continually add to and improve them to deliver better data reduction over time.
Our data reduction technologies are:
- Inline pattern removal: we easily remove all-pattern blocks, and store them very simply in a highly-efficient metadata structure
- Inline deduplication at a variable geometry: we use a flexible deduplication geometry that detects dedupe segments from 512-bytes to 32K in size, and stores them with a single metadata entry. These segments can occur at any 512-byte geometry (no 4K alignment issues).
- Inline compression: we compress data that remains, on average getting >2-to-1 reduction just from compression alone (great for databases!)
- Clone / copy deduplication: we have a special-purpose deduplication that makes data snapshots and copy operations (like xCopy) extremely fast and metadata efficient
- and 5: post-process data reduction: we have a constantly-running (and CPU-budgeted-for) background process that analyzes data to more deeply compress data (using more CPU-expensive compression) after it has been on the array for a while, delivering additional reduction for long-lived data.
As far as what is inline vs. post-process, #1-#4 are always on inline. The relative aggression and priority of #2 and #3 can be turned up and down (automatically by the array) to maintain consistent performance. In practice turning these down is rare, but can happen under the highest write workloads to protect the performance of the array. And if potential data reduction is missed, the array achieves FULL data reduction via #5, which happens within hours.
But in our minds, the real story here is the lack of compression. We’ve never shared this data before, but Pure averaged data reduction rates across our entire customer base of 100s of arrays, and found that on average, compression delivers MORE data reduction than deduplication, esp. true for database and mixed VM workloads (don’t forget, there are applications in those VMs!). You’ll also see via the bell curve where most of the arrays fall into the population of data reduction delivered, and compression dominates in that sweet spot. Here’s the data:
Let’s say that more simply: if you don’t deliver compression, you lose out on more than 50% of the potential data reduction in mixed workloads, thus making a solution 2x more expensive on a like-for-like hardware basis.
So in the end, the “100% inline” debate is the sideshow, the real story is the level of data reduction delivered and the use cases that enables, which we believe Pure wins in hands-down.
2) In-Memory Metadata Model
EMC spent a good deal of time talking about XtremIO’s in-memory metadata model, where metadata is persisted and lives in the DRAM of the controllers vs. being persisted to flash. EMC is right in highlighting metadata – it is the life blood of advanced services. Deduplication, compression, thin provisioning – they are all powered by metadata, and in fact I’m guessing that XtremIO’s simple 4K metadata model is a big reason that the platform doesn’t deliver compression or fine-grain variable-chunk sized deduplication (see above).
By way of comparison, Pure Storage also makes heavy use of metadata, but our model for it is very different. Pure commits and persists 100% of metadata to flash itself (it is actually stored and protected in the same data structures as user data), and then caches the most heavily-used metadata in DRAM to speed operations.
Ultimately, when analyzing metadata models, there are three things that are important:
- What does the metadata enable? In this case, this seems to be a clear win for Pure – our relatively more complex and richer metadata scheme enables finer-grain, variable size deduplication, and compression (remember, compression makes 4K blocks smaller and keeps track of how variable-sized smaller pieces fit into slots).
- How is metadata protected? Since metadata is the lifeblood of any modern array, protecting it is critical. It appears that XtremIO commits metadata to memory, and then relies on UPS devices to de-stage memory to spare SSDs in the controllers on the event of a full power loss. I’ll leave it to the reader to judge the resiliency of that approach, and will just contrast it with the Pure approach that before an IO is ever “acknowledged” back to the host, both it and it’s metadata has been written to at least two redundant locations on solid-state memory in the array.
- What size and performance implications does it have? How much metadata there is and how fast it is accessed ultimately will have a large impact on performance in any array. Going “deep” into this is beyond the scope of this article (metadata structures can get very complex), but I’ll say that the Pure Storage metadata is highly optimized to ensure that access to flash to read metadata for I/O operations is minimal, as the most important metadata is cached in DRAM, but enabling larger metadata structures on flash is key to enabling our rich services.
3) Consistent Performance with no Performance Penalty for Garbage Collection
This is one of the odder claims from EMC, and honestly I’m not sure who it is directed at. What EMC has claimed here is that:
- XtremIO doesn’t do garbage collection at the system level
- Other arrays do, and this impacts their performance
- This performance impact is usually felt around 80% full
To respond in kind to each of these, EMC leverages expensive eMLC SSDs, which do the garbage collection inside. We’re not privy to exactly which eMLC drive EMC uses, but typically these devices contain from 30-50% extra flash to be able to manage garbage collection within the drive. Let me repeat that: the eMLC SSDs have about 30-50% extra flash than advertised inside, and they use this space to do garbage collection…folks, there is no free lunch in flash garbage collection – it’s a property of the medium and it has to be done.
Pure Storage uses consumer-grade MLC SSDs, with typically 5-7% over-provisioning, and instead we reserve (and hide from the user) 20% of the raw flash to manage garbage collection. We also reserve dedicated CPU time for this garbage collection, which is an ongoing background process. What does this mean? Garbage collection has no impact on the performance of the array, and Pure Storage guarantees full read and write performance up to 100% full. If you are running a Pure Storage array and see performance loss at any % full, we ask customers to open a support ticket and treat that as a bug.
Finally, it’s worth noting that implementing compression and variable size deduplication likely require garbage collection (4K overwrites won’t fit neatly into 4K back-end spaces). Any vendor serious about shipping these features will have to solve delivering garbage collection without performance impact as Pure has.
4) XtremIO is a scale-out array where performance and capacity scale linearly
Much has been said in other forums about the scale-out vs. scale-up debate, so I’m not going to go very deep into that one today….but a few things to understand:
- Scale-out’s core advantage, and it’s Achilles’ heel is that performance and capacity scale linearly. Want more performance? Add a node. Want more capacity? Add a node. Scale-out is a great architecture for adding performance, but a very expensive one for adding capacity. Looking at the Pure Storage customer support data, I’d submit that we see that most customers are in much greater need of expanding capacity than performance. A few hundred thousand IOPS is enough for almost all workloads in the datacenter (which according to the XtremIO specs can be delivered by one xBrick), but there are many workloads which need more than one brick’s worth of capacity…so for all those customers, extra controllers, extra switches, extra UPSes all have to be purchased just to add a few more SSDs to the system.
- Scale-out also has some very interesting implications from an availability and performance perspective through failure. Scale-out architectures lose “1/n” performance on a single controller failures where there are n controllers. In a 4-brick (8-controller) cluster losing 1/8th of your performance may not be too painful, but in a 1-brick (2-controller) deployment 1/n is 50%. The other important thing to understand is that in scale-out models, all LUNs in the array are spread across the entire array…meaning that if you lose a complete node, you can lose access to the entire array.
From the Pure Storage point-of-view, we’ve designed our architecture from day 1 to support scale-out, when we need it (which is why we chose InfiniBand as our cluster interconnect as well). What we’ve seen in practice though, is that we’re meeting the vast majority of customer use case needs via our current approach, which is to first focus on maximizing scale-up. Via scale-up we’ve been able to deliver capacity expansion (customers can deploy FlashArrays which range from 10TBs to >150TBs usable, with much more to come next year) from a single set of controllers, and every year via Moore’s Law we deliver faster controllers which are easy, non-disruptive, and low-cost upgrades to existing arrays. Customers can choose to expand capacity and performance independently, and both types of expansion are non-disruptive and at lower cost than the scale-out model.
Do we eventually plan on supporting scale-out? Absolutely…when (and if) the market asks us for it. But for now, our customers are pushing us harder around more aggressive feature, manageability, and ecosystem integration support (which is an area where we are already market-leading in the all-flash array space). We’re also, as a generalization, more focused on reducing the effective cost of flash than breaking the 1M IOPS barrier. We’re delivering full-featured, no compromise solutions today at $3-4/GB usable and many hundreds of thousands of IOPS with consistent <1ms latency…and if I had to push harder on our engineering team around one dimension of improvement, higher performance via scale-out wouldn’t be it.
Closing Thoughts: Trying is Believing
In closing, this has been a great week for the free flash world. With EMC XtremIO finally wading into the ring, more and more customers will come to the realization that the time is right to get serious about replacing spinning disk for performance workloads with all-flash solutions – enterprise flash has indeed become mainstream.
What preceded in this post was 2,500+ words of analysis…but what I urge you to do is raise your head above the technical arm wrestling, and judge for yourself. Get out of the land of theory, and bring both solutions into your datacenter, try them out, and we believe that the difference between the technologies will become obvious. Pure Storage is dedicated to delivering a better storage experience using flash as a catalyst, we’re here to stay, and we’re going to win in the market one happy customer at a time.