If the media reports are true, Dell-EMC is planning to announce new versions of both VMAX and XtremIO next week at Dell-EMC World. It’s been an interesting few years as EMC’s (now Dell’s) all-flash strategy has wavered back and forth from VMAX to XtremIO and (maybe?) back again.
If you are a VMAX or an XtremIO customer – there has never been a better time to hit the pause button, reflect on the past four years of EMC’s flash strategy, and decide if you are ready to sign-up for another four years, and if so, which product you’re going to trust to run your business.
This blog recalls the history of EMC’s flash evolution from Pure’s vantage point. It’s a long read – but a history we believe is important to understand, because storage is fundamentally about building trust and consistency. We use quotes from EMC spokespeople to tell this story, recounting:
- EMC’s acquisition and then introduction of XtremIO in 2013-2014, where bold statements were made about how legacy architectures weren’t meant for all-flash, XtremIO was purpose-built for the future, and XtremIO was ready for enterprise reliability (6x9s);
- Continued aggressive focus on XtremIO through 2015, with VMAX being so under-prioritized some wondered if it was being retired;
- Challenges in reliability and delivering on the XtremIO roadmap that started to crop-up throughout 2014 and 2015, where destructive upgrades and stories of customer issues surfaced widely;
- The abrupt reintroduction of VMAX in 2016 with an all-flash version, and the EMC sales motion pivoting strongly back to VMAX. This pivot back was so extreme, that, according to IDC, sales of XtremIO actually dropped -27.7% in 2016 (while EMC enjoyed strong overall flash sales)!
So in 2017, EMC customers are left with a confusing choice…two overlapping products in XtremIO and VMAX, both getting new refreshes for 2017, and EMC not setting a clear direction for the future. VMAX being heralded as the pillar of enterprise reliability, and XtremIO being both purpose-built for flash and designed natively for data efficiency (deduplication and compression). Should one really have to choose between reliability and efficiency in 2017?
As you go to EMC World next week and hear pitches on the updated versions of both these products – ask the tough questions:
- Why is EMC continuing development of both? Why can’t they choose?
- Which is planned to be the long-term Tier 1 platform?
- What are the real trade-offs between both?
- What does the data show about the availability difference between VMAX and XtremIO? How does availability improve/reduce as you add Vbricks / Xbricks?
- What are the plans for both platforms to migrate fully to NVMe?
- If I invested in the X1 generation of XtremIO or the first-generation VMAX-AF, how do I migrate to these new releases? Do I get trade-in credit for my old system? Is that migration non-disruptive?
And don’t forget that every forklift upgrade is an opportunity to see what else is available on the market. At Pure – we just introduced FlashArray//X – the new 100% NVMe version of FlashArray. It brings performance, reliability, and efficiency – no compromises! We’d be happy to let you take one for a spin – compare it head to head with the latest generation VMAX or XtremIO!
And for those interested in going deeper, read on for the full history…
2013: The End of the Tiering Half-Decade, The Beginning of XtremIO
From about 2008-2013, EMC (and most storage vendors) had their flash strategies squarely focused on tiering. For most of 2013, EMC didn’t push all-flash, as XtremIO was too new in its DA phase. But in late 2013, XtremIO shipped GA, and so began the pivot to being squarely “on” the all-flash bandwagon. Also began the difficult and interesting challenge of positioning XtremIO (the new all-flash stallion) vs. VMAX (the quintessential disk-era cash cow).
Some early positioning of why XtremIO was purchased, even though EMC had two Hybrid arrays, VMAX and VNX:
- “Hybrids would be pretty heavily impacted – mostly because their IO paths and code was not designed for ultra low latency (messes with caching), mature RAID would cause write amplification (bad with flash), and perhaps most fundamentally, the architectures were not designed for hundreds of thousands to millions of IOps being commonplace” [link]
- “For the right workloads (which will expand over time), AFA designs that are built ground up for flash will indeed cannibalize other architectures to some degree” [link]
Meanwhile, the attempts to build-up trust in XtremIO’s reliability started early:
- “The scenario of ‘total loss of power’ or ‘isolation of an X-brick’ is covered through batteries to destage. As you can see – this is a very robust model with no SPOF.” [link]
- “XtremIO…has not 99.999%, but 99.9999% availability” [link]
- “We’ve carefully considered all the various ways that storage systems can fail, and architected XtremIO to provide the performance and reliability EMC customers have come to expect.” [link]
2014: The Pivot to XtremIO
With XtremIO now GA, EMC started to use the might of their sales force and market muscle to their advantage, and they strongly sold XtremIO everywhere they could. If you attended EMC World in May 2014, it was all about XtremIO…to the extent that many in the industry were wondering if VMAX was officially dead?
- “You’re simply missing out if you ‘stick with’ things like EMC VMAX…” [link]
- “If you find yourself telling your customer why they SHOULDN’T be moving OLTP workloads that need predictably low write latency to an AFA, or VDI at scale shouldn’t be on an AFA, or Virtual Server infrastructure that has a lot of commonality shouldn’t be on an AFA… Well, you might be doing your customer a disservice. And if you need an AFA – XtremIO is very, VERY compelling.” [link]
Later, in July of 2014, EMC quietly announced VMAX3, with no launch event or big song-and-dance. Think of that – the company’s (once?) flagship product gets a major generational update, but almost no fanfare whatsoever. Odd, odd indeed.
Meanwhile, EMC used VMworld 2014 to announce the XtremIO 3.0 release, an important upgrade that brought compression and snapshots. But the funny thing about that 3.0 release – it perfectly illustrated XtremIO’s metadata-constrained architecture. In order to fit the metadata required for compression, the fundamental XtremIO block size (and dedupe granularity) had to be moved from 4K → 8K….meaning to add compression they had to reduce the granularity of deduplication. This block side change had a side impact of improving performance (less metadata transactions per large I/O), which they also marketed heavily, despite the impact it had on efficiency:
- “We were able to do the impossible, give you more performance on everything using the SAME hardware, things that are heavily defendant [sic] on performance like DB’s (IOPS and bandwidth), now works [sic] even faster.” [link]
EMC memorably issued a $1M guarantee to focus attention on the fact that their data reduction was always on (attempting to shift focus from it being dramatically less effective than Pure’s). We were honored that Pure warranted the time and attention of the EMC CEO, and we dutifully informed customers how they could easily collect a $1M prize from EMC.
And…then the bomb dropped. It turned out upgrades from XIO 2.x → 3.0 were going to not only be disruptive, but actually be destructive, meaning that data would have to be moved off the cluster and re-initialized completely. Mic drop.
Customers and analysts were shocked:
- “My opinion, that in 2014, if we need any disruption to update/expand a production storage array, we’re doing it wrong.” — Andrew Dauncey, EMC customer
- And the brilliant diatribe “XtremIO Craps on the EMC Badge” from former customer and industry educator Nigel Poulton, which I won’t try and summarize but is worth your time to read completely.
Meanwhile while the drama was unfolding, the EMC website still stated…
- “XtremIO eliminates the need for planned downtime by providing non-disruptive software and firmware upgrades to ensure 7×24 continuous operations.”
And EMC replied, never quite admitting to “destructive,” but attempting to calm the fury with the notion that this isn’t so abnormal for them?
- “First of all, yes, it is accurate to call the XtremIO 2.4->3.0 upgrade a disruptive operation.”
- “When you change the underlying “layout structure” – there’s no nice way around it. Getting the new goodies means pulling the data off, updating, then putting the data back on.”
- “EMC has a history around upgrades that goes back a long way. Some good, some bad.”
- “Further out, we know we’ll need a new hardware platform for three reasons. We don’t anticipate this will require a disruptive upgrade.” [link]
While disruptive upgrades are bad (and something we’ve never had to do for a product at Pure once it’s gone GA), no one, including us, could believe that a destructive upgrade for a GA product was even a thought that a Tier 1 storage company would entertain, let alone ship. I’ll also say that we’ve changed our Purity Operating Environment data layout multiple times and in fact have completely re-written our dedupe format and engine within Purity over the past 18 months – all non-disruptively.
2015: Accelerating Focus on XtremIO
As we plowed into 2015, EMC’s field teams continued to rotate towards selling more and more XtremIO. EMC World came in May, and again it was all XtremIO with nary a mention of the recently-released VMAX3. XtremIO gained share, and EMC’s Engineering team seemed hard at work continuing to work on resiliency and scale, indeed finally shipping the promised NDU upgrade to XIOS 4.0, including online cluster expansion (as long as your cluster was expanded with the same size bricks). EMC boasted about delivering on their promise of online scaling:
- “Start with 5TB, and non-disruptively scale all the way up to 320TB (8x40TB X-bricks).” [link]
Oddly, this statement has yet to come true. While clusters can now be expanded non-disruptively, if you start with a given brick size, say 10TB, you can only expand by adding similar 10TB bricks. If you want to change brick size – you must disruptively start a new cluster and move data. This will be particularly interesting to watch as X2 bricks are introduced – what’s the upgrade model from X1 bricks? And of course, EMC continued boasting of XtremIO’s success and reliability, and reiterating that their disk platforms weren’t purpose-built for flash:
- “There is NO SPOF in XtremIO – period.” [link]
- “[VMAX and VNX] were never designed with NAND as the persistence layer in mind” [link]
- “We’re gaining market share because XtremIO is …. enabling the IT application and storage opportunities for the next decade, not the last one.” [link]
2016: The Pivot Back to VMAX
Despite all this externally-apparent positivity, we at Pure were observing an opposite reality in the field, as we engaged with customers and partners on a daily basis. We heard many customer stories of upgrade challenges, downtime, and basic resiliency issues. One of these stories eventually made it into the public domain, most did not. Getting louder and louder in 2015, we heard more and more customers and VARs losing faith in XtremIO, and one would have to imagine there was a similar impact on the EMC field team’s interest in selling XtremIO.
And so in February 2016, EMC hosted a stand-alone flash launch event unleashing VMAX All-Flash to the market, as well as taking the opportunity to talk-up now-defunct DSSD. Oddly, there was nary a mention of XtremIO. The pivot in positioning was breathtaking:
- “There are enterprise workloads that count on 99.9999% available infrastructure – because they aren’t architected for application-level resilience.” [link]
- “We’re disrupting ourselves – and we deeply believe that in 2016, to not have an all-flash design for transactional workloads – well, if you don’t, it’s just not a modern datacenter!” [link]
This was, well, pretty odd. First off, in 2014 (linked just above) EMC executives were telling folks that XtremIO was the AFA for transactional OLTP workloads…what changed? And they had also claimed XtremIO was 6x9s – the pinnacle of availability. So the new positioning for VMAX-F was the AFA when you really wanted reliability, an array designed for transactional workloads. (Pure, incidentally, achieved 6x9s in our first year of FlashArray//M shipments, and we’re coming-up on our second!)
But, for all the customers that EMC had sold XtremIO to for the past few years, moving back to VMAX-F to get reliability was a bit of a compromise. For one, it didn’t have compression or deduplication (it still doesn’t have dedupe, though an add-on compression card was introduced). It’s comparably, well, huge. And while EMC made some efforts to simplify it, it’s like managing a VMAX because it is a VMAX.
But perhaps most interesting was the complete silence on the pivot back to retrofit. All those statements over the years of the value of a purpose-built architecture, of optimizing for flash, of having efficiency built right into the core of the array…did they no longer matter?
On face value, it seemed that customers were being asked to swallow a tough choice:
- Pick VMAX if you care about reliability, but can live without efficiency or simplicity
- Pick XtremIO if you care about efficiency and simplicity, but are less focused on reliability
Hmm…tough choice. And interestingly, public statements on XtremIO seemed to soften a bit in terms of that confidence in reliability. You be the judge:
- “We are spending a lot of time and effort to continue hardening the core XtremIO platform. XtremIO is doing very well statistically when it comes to availability relative to our storage stacks that have more run-time (and as you can see from the stats above – with 7000 X-bricks deployed, is at the scale where stats matter). However, we know we really need to double down on hardening the platform, improving updates, cluster expansions and other use cases. We’re now on release 4.0.5 – and the focus has been squarely on platform hardening. As customers press XtremIO into their most mission critical use cases, they want a rock, and we must always deliver.” [link]
But from our vantage point in the field, the EMC sales team seemed to have given up and pivoted, strongly favoring selling VMAX. Not only was it more reliable and more familiar, VMAX’s inefficiency enabled them to charge the customer more for the necessary capacity – a somewhat win-win for EMC, depending on your perspective.
And it turns out, the numbers recently came out to illustrate this pivot. After break-neck growth in 2014 and 2015, XtremIO actually declined in revenue in 2016 by -27.7% according to IDC (while EMC still grew all-flash revenues solidly, now riding VMAX-F)!
Into 2017 – Which Horse Wins?
As we enter 2017, the Dell-EMC merger itself has closed, but the majority of the post-merger consolidation phase is still ahead. Dell must aggressively pay-down their debt, and one obvious way to do that would be to eliminate overlapping products. And at a high level, EMC / Dell have very overlapping all-flash array portfolios:
- Tier 0: DSSD
- Tier 1: VMAX-F and XtremIO
- Tier 2: EMC VNX/Unity and Dell Compellent (plus Dell Nutanix OEM and VMware vSAN)
The second axe allegedly hit the XtremIO-F file integration project. The XtremIO team had been busy talking-up file integration as the major new feature in the next XtremIO version, going so far as featuring a preview of it at Dell World in November 2016:
- “During 2017, Dell EMC will release an XtremIO-based unified block and file array. By delivering XtremIO NAS and block protocol support on a unified all-flash array, we plan to deliver the transformative power and agility of flash in modern applications to NAS.” [link]
Then this happened – with an EMC spokesperson confirming this key XtremIO roadmap item has been halted. Taken together, this is pretty surprising news. Michael Dell himself talked-up DSSD at Dell World just a few months ago as “a game changer”, and the XtremIO team has been blogging and tweeting about file integration.
But, at least it appears, that the long-awaited second hardware generation of XtremIO, “X2” is indeed being announced at EMC World next week. If I were an XtremIO customer, I’d be wanting to know some pretty obvious details:
- Will I be able to add X2 bricks to my existing X1 cluster?
- Will this X2 brick addition be non-disruptive?
- Do X2 bricks support NVMe flash instead of SAS for higher performance and future-proofing? If not, what is the path to NVMe? Will I be able to mix future NVMe bricks with my current X1 and X2 bricks?
- If I want to upgrade to X2 bricks, can I trade-in my X1 bricks for a full credit, or do I have to re-buy my bricks to take advantage of the technology upgrade?
- X2 bricks apparently leverage a new HA stack, removing the problematic X1 battery-backup approach…how hardened is this new resiliency stack?
As you can probably guess, Pure customers don’t have to ask any of these questions.
So welcome to 2017. We hope you go armed into EMC World ready to listen, and understand where Dell is going with XtremIO X2 and VMAX-F. And in case you’ve heard – there’s a new, (more orange) //X in town. It’s 100% NVMe, and if you are considering any all-flash upgrade, you should give it a look!
Note: this is a well-researched post where we’ve painstakingly tried to be accurate. If we’ve made mistakes – point them out in the comments and we’ll get them fixed!