Barry Burke (EMC) and Hu Yoshida (HDS) have been engaged recently in a both informative and entertaining blog war about the efficiency and performance of their respective flash auto-tiering solutions, EMC FAST and Hitachi Dynamic Tiering, respectively. You can see both sides of the debate here and here.

Sometimes in the fog of (blog) war, one loses perspective and fails to see the big picture for what it is…so while Barry and Hu debate which solution wastes more/less flash, uses more/less cache and CPU, and impacts overall array performance more/less, here are the big takeaways that it seems both sides agree on:

  • Automated tiering is a resource-intensive process, requiring dedicated metadata, CPU-driven data usage analysis, and movement of large chunks of data. This stuff isn’t easy, and to do it properly, you either need to add dedicated cache/processor power to the array to manage it, or it steals from the existing resources on the array.
  • First-gen automated tiering solutions move large chunklets of data between flash and disk. You can debate if 42MB is better or worse than 7.5MB, but frankly both are enormous when you consider that a single 4K block of active data could pin an entire multi-MB-sized chunklet of data in flash—that is, at least for isolated 4K-active blocks, the debate is ultimately quibbling over .05% and .01% cache efficiency! Flash is expensive, and both these solutions could waste it hugely. Given the multi-MB size, we should probably call them hunklets, not chunklets.

Finally, there’s a third point which is just as obvious: neither one of these arrays were really built for this purpose. Automated flash tiering was a feature retrofit on an existing disk-centric architecture in both cases, and just looking at the high-level details of both implementations makes that pretty clear (MB-sized hunklets, infrequent hunklet promotion/demotion, complex management).

It is instructive here to look at the last major architectural shock to Tier 1 arrays to see an analogous progression. When 3Par invented and introduced Thin Provisioning, it was hard to argue against it being a clear innovation. Others rushed in to copy it, but the problem was that thin provisioning required a degree of whole-array virtualization to implement that legacy disk arrays just weren’t built for (they were designed around aggregating whole disk drives into simple RAID sets). 3Par was built on whole-array virtualization of individual storage segments (chunklets) which enabled thin provisioning/allocation, and it took years for competitors to retrofit their competitive solutions, which in every case were a bit less efficient, a bit harder to manage and just not quite as easy or efficient as the array that was purpose-built for thin provisioning.

What I see in the automated flash tiering game is the exact same challenge: tiering is hard, resource-intensive work, and unless you design an array for it from the ground-up, it will always just feel like a retrofit. Automated tiering is just one of many features that show how flash is stressing the legacy disk-centric array architecture. We’ll explore more of these dimensions in upcoming blog posts.