In another blog post about Non-Disruptive Upgrades, I outlined how our NDU process works, and how we’re able to transition customers to newer, more powerful, models of the Pure Storage FlashArray without the need to perform complex and long data migrations. You can see a new demo of non-disruptive upgrades in action. This video is Chapter 1 of our 3-part deep dive Non-Disruptive Everything video series.
But what about the storage media? As newer, faster, and denser storage media becomes available, how would one keep the storage current without the constant need to use more and more physical space necessary to rack newer expansion shelves?
Non-Disruptive Storage Removal
In Purity Operating Environment version 4.7.0, we’ve introduced the ability to evacuate data from existing storage shelves and physically remove and replace those shelves with newer denser storage, non-disruptively.
Non-disruptive storage removal supports a variety of use cases that enable customers to increase storage density in-place.
How does storage evacuation work?
Because our storage array uses content addressing metadata to abstract the underlying media and location of data from the storage operating system, we can use a background process to move content out of a storage shelf while adjusting the metadata in the system to point to the content’s new location.
The process is performed remotely by Pure Storage Support, and because it’s a low priority background process there’s no effect on the front-end storage performance.
Once the process is started, Purity will work in the background to move data from the evacuating shelf to other media in the system. There’s no need to specify where to move the data, Purity decides based on system utilization where best to migrate the data. For example, if a new expansion shelf had just been installed, most of the data will be moved to that newer, emptier, shelf.
While Pure Storage Support performs and monitors the shelf evacuation remotely, customers can monitor the process of the evacuation in the UI. This can be done by hovering the mouse pointer over the drives in the web GUI, where the evacuating drives will be displayed with a grey color and the mouse-over will indicate the current progress of evacuation of that flash module. Alternatively, the puredrive list command can be used to display the evacuation status of each flash module.
Once the data has been evacuated, the storage can be removed, having been replaced with denser storage media.
When Purity completes the evacuation of data from the storage media, the system will automatically secure erase all evacuated SSDs. This removes all data from the SSDs and erases the encryption keys from the SSDs. The media is now completely empty, and if it were to be re-inserted into another system would simply be recognized as new, empty, media. The secure erase functionality is the same method used by Pure Storage to remove data from PoC (Proof of Concept) systems before they are returned from customer sites.
How do we non-disruptively remove the shelf?
You may be wondering how we physically get the evacuated shelf disconnected and removed from the array. We rely on the same tried and true methods and capabilities that have been allowing us to perform both software and hardware upgrades non-disruptively for years now. Pure Storage has performed more than 100 non-disruptive upgrades per week in our customer base, and it is this core capability of our storage that forms the technical basis for our EvergreenTM Storage model. How we do upgrades non-disruptively is described in detail here, below is a simple explanation of how this capability enables non-disruptive physical removal of a storage shelf.
External storage shelves are attached via SAS (Serial Attached SCSI) chains of up to two shelves per SAS chain. Removal of the storage shelf at the end of a SAS chain is extremely simply and straight forward. You evacuate all the data from the shelf, then simply detach the shelf by removing the cables that connect it to the other shelf. What’s more interesting is how we remove a storage shelf from the middle of a SAS chain, the shelf that sits between the storage controllers and the other shelf at the end of the SAS chain.
All storage connections in a FlashArray are redundant. Each controller has two connections, each on a different shelf SAS module, to the first shelf in the SAS chain. Those connections continue through the SAS modules in the first shelf redundantly to the SAS modules in the 2nd shelf.
The first step in the process of non-disruptive shelf removal is to shut down controller 0 (ct0), disconnect the cables from that shelf and connect ct0 to the shelf at the end of the SAS chain (there are free ports available on the shelf to do this). As demonstrated in this non-disruptive upgrade post, because of our architecture we’re able to maintain full performance through the entire process.
Once the shelf cabling has been adjusted, ct0 can be restarted, and then ct1 can be shutdown to repeat the shelf re-cabling process for the other controller.
After ct1 is back online and accessing the 2nd shelf directly, along with ct0, the evacuated shelf can simply be unplugged and removed from the system. The newer, denser, storage shelf can be attached in it’s place, now becoming the last shelf in the SAS chain.
Further improving non-disruptive upgrades with Pure Storage
With the introduction of non-disruptive storage removal our customers are now able to continuously transform their storage array into the future as their capacity needs evolve. We have also combined this technology innovation with business model innovation to expand our Evergreen Storage model even further, through our new Capacity Consolidation program.