image_pdfimage_print

In the all-flash storage market, things are unfolding largely as the team at Pure Storage anticipated:

It’s all about the software.

Of course, software has already eaten disk-centric storage—market leaders EMC and NetApp are predominantly software companies. Storage vendors all get access to the same or comparable processors, memory, hard drives, interconnects, and the associated firmware, and then differentiate based on the capabilities of the software controller that melds that hardware into a single system. Hardware also has a far shorter life expectancy than the software: NetApp won in the filer business because its software was easy and reliable, not because the early generations of its hardware were superior to those from Auspex, et al.

Flash-centric storage will be no different.

As we have remarked before flash memory is so radically different from hard drives that it requires wholly new software controller architecture. The vendors that get the flash controller software right will be the long-term winners. Any differentiation via hardware (proprietary flash DIMMs rather than commodity SSDs) or firmware (proprietary alternatives to the firmware provided by the flash manufacturer) will prove ephemeral at best. Focusing on the system software also enables more rapid adoption of industry-standard hardware advances, embracing the latest CPUs, SSDs, and interconnects without painful rework.

For long-term differentiation, instead look to the reliability, performance, features, and simplicity of the all-flash array software. Below are three litmus tests that you should consider in evaluating the quality of a candidate all-flash array:

True Enterprise High-availability (HA)

By enterprise HA we mean that the storage system has been hardened to support automatic, rapid isolation and recovery from underlying hardware and software faults—that it is self-healing. HA cannot be left as a homework exercise: Requiring a customer to buy two machines and then figure out how to configure failover, resync and failback policies is not what we mean by HA. Enterprise HA is crucial for compatibility with existing enterprise workloads like VMware, Oracle, Microsoft, et al., so for many enterprise deployments it is simply non-negotiable.HA is also the most efficient way to deliver flash media without a single point of failure. In a typical HA configuration, each SSD is redundantly connected to at least two controllers in the event there is a problem. Without HA, you have to resort to the naïve mirroring of your data across flash appliances. With mirroring, you are looking at needing 2X the storage media, and that only protects you from a single failure. In contrast, a RAID-6 style scheme on flash can provide dual parity protection (up to two overlapping failures) for <25% overhead and better overall performance (less writes). So with flash media generally the most expensive component of all-flash storage, HA makes financial sense too.

But HA generally takes person-decades of engineering to get right—requiring deep knowledge of operating systems internals. And it’s very difficult to test properly: The necessary hardening of all the perturbations requires an investment in both hardware and automated fault injection that is beyond the reach of most start-ups. Which all-flash arrays have it right? Assuming the vendor claims HA, then simply ask to conduct drive pulls, cable pulls, and component shutdowns while the box is under load, and see what happens. Enterprise HA vendors welcome such invitations to prove out their technology, but there are less than a handful of HA arrays in the all-flash ranks.

Inline, sub-millisecond deduplication and compression

For virtualization & database workloads, by far the most significant cost savings comes from data reduction: With deduplication and compression, Pure Storage is able to shrink such workloads five to ten-fold, and that’s not counting the savings from thin provisioning or deduping snapshots! (Comparable data reduction on hard drives is simply not feasible—deduplication is too random I/O intensive for performance workloads on disk.)Inline deduplication also saves write cycles on the flash—since flash degrades slightly with each write, it makes no sense to write the same data over and over, and then dedupe it later. (Post-process deduplication is not only slow, but actually increases writes to flash.)

To make comparison easy, Pure provides a freely downloadable software tool that will measure your dataset’s reducibility. Other vendors serious about data reduction will provide such tools so you can compare effectiveness of the algorithms without procuring a box. But also be sure to eventually do the real-world testing of your vendor’s deduplication and compression algorithms under performance workloads. Otherwise, you may find that the implementation is only datasheet worthy, but doesn’t offer the submillisecond performance necessary for production deployment.

Today, you wouldn’t buy a disk-based back-up appliance without data reduction. Going forward, you won’t buy an all-flash array without it either. Yet there is a relative void of all-flash arrays in the market with effective inline data reduction.

Efficient snapshots

Snapshots are critical to backing up enterprise data consistently and without impacting performance. As such, snapshots are a bellwether for the quality of the overall data management services of the storage software.Disk-centric snapshot implementations tend to fall into two buckets: copy-on-write variants that are space efficient, but for which there can be significant performance overhead as you intermix snapshotting and updating the dataset; and split mirror approaches that have excellent downstream performance, but are not at all space efficient and cost more up front in performance.

With flash and the right algorithms, you can have your cake and eat it too: snapshots that have near zero overhead in both space and performance! This means no complex space or performance planning, and no painful tradeoffs between protecting your data and cost (having to buy more space and IO capacity just to accommodate snapshots). The litmus test for near zero overhead snapshots is to crank up a performance workload, and then start snapshotting and measure the performance and space impact.

Our three litmus tests are intended to be essential, but by no means sufficient to determine whether the all-flash array you are considering will be a long-term winner. As we mentioned above, there is no substitute for trying out the product in your data center on your workloads. Still, the relative dearth of vendors getting these essential features right will help you narrow your focus to all-flash arrays that better meet the needs of your business and that will be safe bets—long term winners in the marketplace.

And finally, let’s close on our opening point, that in the end it’s all about the software. With the recent deluge of flash announcements its easy to get caught-up in future product promises of millions of IOPS and exotic flash hardware optimizations that might someday end in a enterprise-grade flash products. But in reality, enterprise computing and storage have always been about the software, and as flash hardware continues to commoditize, this will be even more true for all-flash arrays.

At Pure Storage we shipped our second-generation product earlier this year, we have announced a broad spectrum of production reference customers, and we are back again today just a few short months later, announcing a major software release. That’s innovation at the pace of software, and that’s what you should expect from the ultimate all-flash array winners.