This post was originally published on this siteAt Pure//Accelerate 2018 Cody Hosterman and I delivered a session on “Moving Data Between Cloud and On-Premises Virtualized Environments“....
Hopefully, you’ve already read about FlashArray//X – the first mainstream 100% NVMe all-flash array from Pure Storage. One of the key new innovations in //X is the new DirectFlashTM Module and DirectFlash Software. I know what some of you are probably thinking – “why the heck did Pure build their own SSD?” We didn’t – that would have been a huge waste of time. We built an entirely new architecture for enabling software to communicate directly with flash, both unlocking performance and enabling FlashArray//X to ship 100% NVMe at an affordable price point. In this blog post, we’ll cover everything about DirectFlash: why we did it, how we did it, and what it enables in //X.
And if you missed our other posts, this is post 3 in a 5-part series on the //X launch:
Part 3: DirectFlash – Enabling Software and Flash to Speak Directly [this blog]
Big SSDs – How Flash Is Turning Back into Disk
The density improvements in flash SSDs have been amazing – you can now buy a 60TB SSD for example. But these big SSDs come at a real cost – actually two real costs. Let me explain.
The first cost is performance density. Why did disk die? Every year it got bigger but not faster, so it got slower on a performance/TB basis. The same thing is now happening to SSDs: today you can buy a 1TB, 15.3TB, or 60TB SSD, and they all have the same fundamental performance.
To make things worse, the industry typically plugs these big SSDs into SAS pipes which are small and only have a single queue. Imagine building a huge soccer stadium, building only one entrance, and making everyone enter in a single-file line.
Incidentally, today FlashArray//M has an advantage in this regard, as instead of using a single SAS SSD, we instead pack two SATA consumer-grade SSDs in each Flash Module, giving us double the parallelism for a given capacity point. (SATA is also less expensive than SAS SSDs, which is nice). This dual-drive approach helps, but doesn’t solve the looming bottleneck problem of big flash / small pipe / single queue.
The second cost is performance unpredictability. Within each SSD is an increasingly complex set of software that runs inside the SSD’s controller chip. This software does the ungratifying task of making the flash pretend to be a hard drive, as well as all the flash management (allocation, wear leveling, garbage collection, error detection and correction, etc.).
The problem is, as SSDs get bigger, this software has to get more and more complex – which leads to more risk of failure. It also leads to performance unpredictability.
When reading or writing to SSDs, you usually get great performance – about 100µs latency. That’s when the SSD’s software is idle. But the problem is – often you also get really bad performance. 10, 50, 200ms latency. And as anyone who has studied tiering knows, a few really slow IOs can spoil the entire apple bin. This was the main reason hybrid flash arrays didn’t work out, leading to the rapid rise of AFAs.
This variable performance has long been an issue within SSDs – in fact this is why Pure developed our FlashCare technology within Purity Operating Environment from the very early days. We rigorously test each SSD to determine how to carefully send data to them to get the best and most consistent performance we can. For years folks would walk around the Pure office kicking the carpet saying “why can’t we just talk directly to the flash!” Indeed.
Your AFA Has a Little Secret
In addition to the Big SSD problems, your AFA likely has a little secret it doesn’t like to admit – there’s still a lot of disk inside! Smart array software is forced to talk to flash by speaking disk protocols like SCSI over disk interfaces like SAS and SATA and when it finally talks to the flash it has to pretend the flash is a hard drive and circumnavigate layers of complex software inside the SSD. Why?!!?!
In about 2012, Pure engineers had one of those “stop the madness!!!” moments, and asked ourselves why we put ourselves through this torture and inefficiency. And so (with a whole bunch of other innovation), FlashBlade™ was born with a new architectural approach that we call DirectFlash:
DirectFlash implements an elegant (but technically difficult) approach: instead of using SSDs, it takes raw NAND flash, wires it up with fast networking (an enhanced NVMe over PCIe in FlashBlade’s case), and enables the flash to talk directly to our smart storage software. The results have been amazing, but that’s a story for another blog post.
So today – we’re happy to announce that we’re bringing the proven FlashBlade DirectFlash architecture to FlashArray with the new DirectFlash Module!
Let’s take a closer look at DirectFlash in this video:
DirectFlash: The World’s First Software-Defined Flash Module
As we said at the onset – DirectFlash isn’t an SSD – it’s something new entirely. And the first step to understanding how DirectFlash is different is understanding it’s software-defined architecture.
A typical AFA, like the left above, may have 100 SSDs in the system, each with its own flash controller chip which is implementing all the flash management software. But unfortunately, each SSD is completely unaware of the other 99 SSDs in the system, leading to an inability to optimize across the entire flash pool.
The DirectFlash Module is a very simple piece of hardware, whose only job is to connect a large pool of flash over massively-parallel NVMe pipes to the FlashArray. From that point – all the magic’s in the software – the DirectFlash Software, which implements all the intelligence for flash management that used to live in the SSD globally across the entire flash pool.
In many ways, DirectFlash Software is the next evolution of our FlashCare software that’s lived in Purity since Day 1. Our engineers have gained significant flash management expertise over the past few years. With FlashCare, we delivered global flash management, allocation, and garbage collection at our system level, though it was only able to be so granular, since it was blocked by the SSD’s FTL. Note that this flash management software has been unique to Pure – it’s what allows Pure to use consumer-grade SSDs while most of our competitors haven’t developed this software expertise and are forced to use enterprise SSDs at a higher cost (some even boasted about taking this path). Now with DirectFlash, our software has massively expanded visibility and can see down to and manage each individual flash die.
At a high level, the DirectFlash Software performs three functions:
DirectFlash: Fast, Direct, Transparent
So with that baseline understanding – we’re excited to announce the DirectFlash Module – the world’s first software-defined flash module!
In addition to being software-defined, the DirectFlash Module is 100% NVMe-connected, leveraging the massive parallelism of the NVMe protocol.
The NVMe protocol itself can implement up to 64,000 parallel queues to the flash device, each with 64,000 outstanding IOs! In our practical usage, we enable the DirectFlash Module to process 256 parallel IO commands. The architectural advantage here is that the parallelism enables each core of each processor in our controllers to have a dedicated queue for each DirectFlash module. For comparison, in our traditional SSD-based flash modules we enable a queue depth of 8, so DirectFlash Modules provides a 32X improvement in parallelism. Beyond parallelism, IO to the DFM is deterministic – it’s bit addressable and there’s consistent access time to each flash block – eliminating the flash latency guessing game.
Furthermore, the DFM is 100% provisioned, meaning that Purity and the DirectFlash Software can “see” 100% of the flash in the system. A traditional consumer SSD has ~8% over-provisioning, and performance enterprise SAS and NVMe SSDs can have up to 50% over-provisioning, or flash that’s simply hidden from the system. By combining this 100% provisioning with the DirectFlash Software’s more efficient global flash management, DFM delivers between 14-36% more effective capacity from the same raw flash:
Here’s an example of what happens when you plug various SSDs and Flash Modules into a FlashArray. It turns out, both the 9.1 TB DFM and Pure’s 7.6 TB SAS Flash Module BOTH contain 9.1 TBs of raw flash – the 7.6TB SSD FM just hides it from the system because of over-provisioning. The 7.6TB SSD FM contains consumer-grade SSDs, but a similar enterprise-grade SSD hides even more flash for over-provisioning and only returns 6.4TBs to the system. The orange bars above represent the usable flash for data once Purity implements its own RAID-HA, HA, metadata, and other overheads. Net net: DirectFlash returns dramatically more flash to the user, which is just one of the ways we drive down the cost of all-NVMe storage to a mainstream price point.
DirectFlash – Only from Pure Storage
DirectFlash just makes sense. Think about it — AFAs have largely used a consumer form factor to address an enterprise-scale problem. How long was that going to make sense? Prior to NVMe, AFAs used the only feasible and cost-effective form factor (SSDs) with excellent results, but it was an architecture designed for consumer products, like personal laptops, where it made sense to manage small scale flash locally. The future will demand far more, and NVMe enables a new flash array architecture at the scale necessary for the cloud era.
So there you have it – a radical new approach to co-designing hardware and software yields higher performance, higher performance density, better efficiency, and is key to driving-down the cost of NVMe flash to make it mainstream. Available in FlashBlade, and now available in FlashArray//X.
Oh – and did I mention it has a beautiful orange heatsink?