Today EMC did their #Speed2Lead MegaLaunch, mostly focused around announcing their flagship all-flash array….

…it can do 1M IOPS at <1ms latency…

…it can scale to 600TBs of flash…

…it offers block-level deduplication…

…it was (re)designed from the ground-up for flash….

….it’s called….wait for it….VNX2!

Wait a minute…VNX2?

EMC’s theme for today’s launch was Speed2Lead, implying that with VNX2 they are now both the performance leader and innovation leader in the market. But does EMC really have the #Speed2Lead, or were they #Slow2Follow? Let’s dive deeper into what they announced, and analyze five key dimensions of the story behind the story.

#1 – Another course-change for XtremIO and EMC’s flash strategy.

First off, the most interesting thing about this launch was what it didn’t include, most notably XtremIO, which EMC continues to reiterate will finally go GA this year. EMC’s flash strategy has been a winding road for the last few years. First it was all about FAST tiering with the legacy VNX/VMAX platforms. Then it was delivering host flash via Lightning (now XtremSF) and networked flash via Project Thunder. Then EMC likely came to the conclusion that the legacy architectures (VNX/VMAX) weren’t scaling for flash, and they acquired technology startup XtremIO with much fanfare. A few months later, they reiterated their commitment to XtremIO with aggressive pre-announcements, and by killing Project Thunder. And now a few months later again we see an aggressive launch of VNX2 and a new flagship all-flash offering in the VNX7600. While I can only speculate what happened as an EMC outsider, it seems likely that XtremIO simply isn’t ready, and EMC had to pick a more mature horse to ride, at least for now. As an EMC customer are you confused on where to place your flash bet? Choose wisely, as there are multiple internal EMC bets, and it’s not clear which will prevail.

#2 – EMC finally ships primary block-level deduplication.

Deduplication has proven to be one of the most difficult modern array features to back-port to legacy disk-based arrays for two reasons – 1) it requires rich, flexible, fine-grained metadata and virtualization within the array, and 2) it randomizes data layout which is particularly performance-problematic for spinning disk. NetApp first introduced primary deduplication into the market in 2006 (then called A-SIS), and most of the storage industry still hasn’t closed that gap for primary storage. But deduplication has evolved a lot since the early days in 2006. Early deduplication was slow, post-process, had large chunk sizes, and was limited in scope to only volumes or parts of the array. Flash has allowed the world to re-think what is possible with primary storage data reduction, and at Pure we believe we have the most advanced form of primary storage deduplication in the market today, bar none…let’s do a simple compare (and this doesn’t even consider our compression advances):

  • NetApp deduplication: post-process, 4K fixed chunk sizes, limited to a Volume, performance challenges
  • EMC VNX2 deduplication: post-process, 8K fixed chunk sizes, limited to a Pool, unknown performance implications
  • Pure Storage deduplication: inline, variable chunk sizes down to 512-bytes (8-16X smaller!), global, and always-on without performance impact

With flash, doing deduplication inline is particularly important: inline avoids writes, post-process adds more writes. So you be the judge on this one, and if you still aren’t convinced, watch this EMC VNX2 video explaining exactly how complicated deduplication can be. Want to get a sense visually for what an impact 512-byte chunk sizes can have vs. 8K chunks or even 256MB FAST tiering pages?

Dedupe and Tiering Chunk Sizes Compared

#3 – (re)Designed for Flash (and Virtualization, and Multi-Core)?

The VNX2 certainly represents a major software and hardware re-architecture for VNX, and one that was sorely needed. VNX’s architecture was so vintage it was originally built around Embedded Windows XP, and the loose combining of Celerra and CLARiiON to form VNX in the last generation didn’t yield architectural purity. So re-designing to leverage multi-core processors, re-designing for virtualization (i.e. to expect and better handle random IO that is a by-product of virtualization), and re-designing for Flash all make sense, any storage product built in the last 5 years all have these at the core of their design. But despite the marketing claims of a complete re-design, EMC publicly stated in the launch that only 10% of the 20M lines of code inside VNX had been “touched” as part of this re-architecture. So does re-writing 10% of the code really yield a wholly-reborn architecture, embracing the radically-different design opportunities that flash, multi-core, and virtualization present? And why on earth does a modern storage array require 20M lines of code?

Pure Storage had the luxury of starting fresh, designing for the all-flash future, and shedding the baggage of 20+ years of storage history (disk IO optimization, multiple RAID levels, tiering algorithms, caching algorithms, etc.), and doing it all in less than 500K lines of code. There’s a reason we named it the Purity Operating Environment: less baggage, less code, less bugs, less maintenance = faster innovation and higher quality.

#4 – A New EMC Anti Scale-Out Religion?

There was an interesting, and different performance/scale mantra espoused by EMC at this event: scaling cores is cheaper and more efficient than scaling-out controllers/nodes. We’ve talked at length about how leveraging an open x86 architecture allows Pure to take rapid advantage of advances that Intel delivers via Moore’s Law in raw processor and core upgrades, improving performance and scale dramatically annually. But this message is a totally new one for EMC, who has been beating the scale-out controller drum for years. First VMAX leveraged scale-out between its engines. Then Isilon’s scale-out became EMC’s white knight for competing with NetApp’s filer dominance. Then EMC acquired XtremIO technology and began espousing scale-out as the only architecture that could possibly deliver on the promise of flash. Then came VNX2, and a new positioning that leveraging increased annual scale in processor upgrades (faster, more cores) would enable scaling from a two-node architecture both faster and more economically than scale-out. Interesting, different, and totally opposite the religion espoused by EMC’s now secondary flash group: XtremIO.

For Pure’s part, we don’t have a scale-up vs. scale-out religion, we believe that both are necessary, but for now we find ourselves largely agreeing with EMC’s new positioning: scaling processor power/cores is indeed the least expensive and fastest form of improving performance, and that should be maximized first, with controller scale-out providing a second dimension for increased scale when necessary. Our Pure Storage FlashArray architecture was designed with exactly this in mind. It has completely modular, stateless controllers, which allow existing arrays to be upgraded year-after-year with faster processors and more cores to scale vertically in-place (and without downtime, mind you), and we’ve designed around an InfiniBand cluster interconnect for our controllers and a software architecture that will allow scale-out to >2 controllers when necessary. We’re aggressively upgrading processors on an annual (or better) pace and keeping-up with the fastest flash products on the market today….and we retain the ability to grow via controller scale-out when the market needs more performance from us.

#5 – Are performance tiering arrays still price-viable in 2013?

Finally, I found it pretty interesting to see how EMC positioned VNX2 from a pricing perspective. When we started Pure Storage in 2009, we set a target of driving the cost of all-flash storage down to less than $5/usable GB, with the belief that at about this price threshold moving from disk to flash for mainstream workloads would be an obvious choice (keep in mind that in 2009 all-flash solutions were literally $20-50+/GB). We hit our $5/GB mark in 2012 with the introduction of the FlashArray, and we’ve been improving those economics ever since. In this release EMC is positioning a $2/GB raw price point for “typical configurations” including flash and disk (likely SATA disk). By the time you account for provisioning losses, RAID overhead, and expensive software add-ons, that price will float substantially upwards on a usable basis, and that is for typical configurations, not performance configurations or even the all-flash VNX7600. So let’s be clear about this one: if your workload is served well by a mid-range array filled with SATA disk, you should probably buy that, as it will be the lowest-cost solution on the market. But if you find yourself configuring your VNX2 (or any mid-range array) with flash, FAST-cache, 10/15K SAS spindles, or other performance improvements, Pure Storage can likely meet or beat your pricing with an all-flash Pure Storage FlashArray, delivering dramatically better all-flash performance, and a dramatically simpler management experience. Give us a shot, in the worst case you’ll put a scare into your EMC rep and get a few more bucks knocked off that VNX2.

So now that the #Slow2Follow MegaLaunch is over, it’s back to business here at Pure Storage. With last week’s new $150M pre-IPO financing we’re both committed and well-funded to build the next great storage company, and we’re going to focus our innovation squarely on delivering all-flash systems that make the world wonder why the storage industry ever bothered with this complex, unreliable, performance variable, and ultimately expensive disk tiering stuff. If you’d like to see the difference all-flash storage can make in your environment, drop us a line.