This post was originally published on this siteQuick post. I have published my ActiveCluster implementation guide for vSphere Metro Storage Cluster (vMSC). You can find it...
The news this week is that EMC launched an all-flash storage array as well as a flash-specific business unit. While EMC’s initial flash products will leverage their existing software/hardware controller architecture, the implication is that EMC anticipates purpose-built flash solutions down the road: “… there’s also an interesting opportunity to build entirely new classes of storage arrays when you don’t have to consider spinning disks anymore …”
It’s the purpose-built qualifier that we find particularly compelling. Let’s look at a recent example from a parallel market: When the Data Domain team embarked on the mission to replace the tape library with a disk-based appliance, they didn’t start with the controller software from an existing tape farm! Instead, Data Domain pursued a radical rethink of backup technology in order to deliver a solution that was purpose-built for backup to disk. With hindsight, it sounds preposterous to even consider starting with the software of a physical tape library in order to build a disk-based backup appliance.
And yet this is precisely what some in the storage industry are embarked upon: with the notable exception of a handful of startups, most all of the flash storage in the data center has been slotted into existing disk-centric storage arrays. If it didn’t make sense for Data Domain to treat disk just like tape, why are we now treating flash just like disk?
We remain convinced that flash warrants a holistic redesign of the storage array (software and hardware) that manages it. To understand why flash requires such a rethink, this seems an apropos time to continue the top ten list we started a couple of months ago (see Why Flash Changes Everything, Part 1):
(5) Forget about misalignment – All existing disk arrays implement a virtualization scheme with sector/block geometry that is tied to their architecture born out of rotating disks. There is a substantial performance penalty within virtualized disk-based storage if the file system, LUNs, and physical disks are not optimally aligned: what could have been single reads/writes are compounded because they cross underlying boundaries, leading to performance degradations of 10-30%. With flash, the additional I/O capacity and the potential for much thinner stripe sizes (the amount of data read or written per disk access) render administrative overhead to avoid misalignment unnecessary.
(4) Fine-grain virtualization and forget the in-memory map – With disk, stripe sizes tend to be large in order to be able to keep the entire virtualization map from logical data to physical location in the array controller’s memory. This is simply because you do not want to face the latency of an additional disk read to find out where to read the data you are looking for. With flash, metadata reads are extremely low latency (10s to 100s of microseconds), and so you can afford very thin stripe sizes—i.e., virtualizing your storage into trillions of chunks—and still guarantee sub-millisecond reads.
(3) Managing garbage collection, the TRIM command, and write amplification – Flash is necessarily erased in blocks all at once rather than rewritten incrementally like disk. Thus the controller software managing updates to flash must be careful to avoid write amplification, wherein a single update cascades to other updates because of the need to move and update both the data and metadata. This problem is exacerbated when the underlying flash is unaware that the data being moved to enable a rewrite may already have been deleted (see the TRIM command). The result is that entirely different algorithms are necessary to optimize the writing of flash versus the writing of disk.
(2) Managing data retention and recovering from failures – Unlike disk, flash exhibits data fade, so an optimal storage controller needs to keep track of when data is written and how often it is read to ensure that it is refreshed (rewritten) periodically (and that this is done carefully so as to avoid write amplification). And unlike disk, there is no penalty for non-sequential writes, so the restoration of parity after losing an SSD can be efficiently amortized across all available flash rather than trying to rebuild a single hard drive in its entirety after a disk failure.
I’m afraid we are going to save #1 for a final (and best?) installment of Why Flash Changes Everything. But we hope that even without #1 you’re convinced that simply plugging flash into a legacy array architecture that was designed to address the idiosyncrasies of disk simply cannot deliver all of the potential benefits of flash. Instead, flash is deserving of the same consideration that disk has enjoyed for decades—that the controller software and hardware be deeply optimized for flash’s unique potential and idiosyncrasies. The benefits to the end user will not just be greater efficiency, reliability, and performance, but the elimination of complexities inherent in managing disk.