In traditional storage, arrays shutting down the storage array is kind of a big, scary, fingers-crossed kind of deal (believe it or not, some vendors make you engage their support/professional services so they can shut it down, it’s so risky). We thought this was ridiculous, so we challenged ourselves to completely change the paradigm. But how and why would you design an array without a big shutdown procedure? Read and find out…
The dreaded “double failure”
If you look at legacy storage architectures and analyze why they occasionally are subject to data loss or corruption, it turns out that most of those data loss events are the result of double failures. Something relatively minor fails (maybe a drive fails, maybe a controller, maybe an internal switch…), which triggers one of the software resiliency “features” of the array, and this software kicks-in to save the array and work around the problem. But here’s where the fun starts…often that resiliency code is a very under-exercised code path. It was written years ago to protect against some arcane failure case, tested well then, and then started to age in the code base. It’s a fail safe, so almost no one uses it, and thus, its a magnet for software bugs…both when it is written, and as the code base ages and evolves around it. So lo and behold, one day you need to exercise that code path because of a failure, and you find that this resiliency code is less reliable than the code it is supposedly protecting, and you suffer a second failure….
Pure’s philosophy: No un-exercised code
When we started designing the HA and resiliency features of the FlashArray, this mantra of no un-exercised code was a minor religion within the Pure team, and you can see that religion manifest itself in several areas of the code:
- Parity re-builds: we felt RAID re-build code should be just as reliable and performant as the normal read/write path…so we designed an array that constantly reads from parity as part of normal operations…about 15% of the read I/O that comes from the FlashArray comes from parity by design…it’s how we isolate drives for writes and make our IO path non-blocking.
- Stateless HA architecture: we built the FlashArray so that controller failure/fail-overs were nothing to be afraid of. Controllers are stateless (no persistent data in them, including in-flight writes), and HA events are designed to be a non-event – pull the power to any Pure controller anytime, you won’t see a performance hit. Better yet, upgrade the software during middle-of-the-day production, no worries.
- No shutdown procedure: the FlashArray has to be able to handle a full power loss with ease…full power loss code is some of the least exercised code in the industry. Our insight? Let’s make turning the array off and pulling the power one and the same. We have to be so confident in our ability to manage power loss, that we might as well make it our standard procedure for shut-down.
So how do you turn this thing off?
By now you have the answer: you just pull the power cords. In full disclosure, since we use standard off-the-shelf hardware components there actually are legacy physical power buttons on the shelves and controllers, but their use is entirely optional, and in fact not encouraged. There is no shutdown button on the GUI or command in the CLI that initiates a shutdown procedure….if you want to turn it off, you just pull the power.
The FlashArray’s design is that an IO is never committed back to the host until it is stored in four locations: two copies in redundant NV-RAM devices (housed in the array’s storage shelves), and a working copy in the DRAM of both controllers. Compared to competitive architectures, there’s no need to try and franticly de-stage persistent data and metadata from DRAM in controllers on power failure, and there’s no reliance on a fragile UPS architecture to keep the array up while de-staging happens. So, if you are evaluating Pure vs. EMC XtremIO or others, I’d suggest a few common-sense steps:
- Ask your vendors about full power-loss scenarios. How does the code work, what levels of protection are there, are there any caveats, and how long does recovery take?
- After you get your answer, test it! Make full power loss testing a standard part of any PoC. Fire-up a large load, pull the power to the rack, restore power, and see how long it takes (and if) the array recovers. In the case of Pure Storage, that recovery time is about 3 minutes, essentially the time it takes the controllers to boot.
Yes, we’re a little different here at Pure Storage. Take the time to understand these differences, they matter.