Flash for backup? You may think that’s crazy. And to be honest, at first we did too when we heard about customers who were buying FlashBlade™, our high performance data hub, and using it for backup. Backup, one of the least strategic parts in the IT budget, didn’t seem to fit with the other analytics workloads – data warehouse, data lake, streaming analytics, and AI clusters – that customers were also running on FlashBlade.
But when we talked to them, not only did they dispel our notions of flash for backup being crazy, but they also convinced us that the backup market is at an inflection point where flash and cloud are playing a transformative role. Backup has changed from being only about minimizing the cost of having a good copy, and has instead become about how to make data available. Recover became create!
So, why did our customers start using flash for backup? Did they really need the performance of flash? And isn’t flash too expensive for backup? It turns out these customers weren’t really using flash for backup per se – they were using it for recovery. And they were using flash to repurpose backup data for DR, test/development and re-use in the cloud.
Over the past decade these customers have been caught in a perfect data protection storm. There has been a Cambrian explosion of data with datasets growing from terabytes to petabytes and beyond. Simultaneously, the introduction of flash has set new performance expectations for data centers. And last, data is becoming more important and a foundation of many modern businesses. These combined to drive the adoption of aggressive RTOs, which previously had been limited to the most critical workloads, to be the new standard for most production workloads.
But as a result, they weren’t meeting their backup and, more importantly, recovery SLAs. It turns out they aren’t alone – backup success rates today are between 75 and 85 percent and even when successfully backed up, 20% of recoveries don’t meet the business RTO. The disk-to-disk-to-tape backup architectures our customers, and many others, were using could no longer keep up with the advanced and constant flow of data they are tasked with protecting today. And simply scaling these old models may give temporary reprieve, but wasn’t solving the underlying challenges.
The backup architecture that most customers deploy includes both disk and tape. This disk-to-disk-to-tape (or D2D2T) backup strategy, a copy of the data is stored first on a disk-based backup appliance and then also saved to tape. The disk copy provides better restore performance than tape alone can deliver. Because disk is more expensive than tape, backup appliances leverage deduplication to provide a relatively cost-effective disk-based backup solution. Even deduplicated disk is still not as cheap as tape, but it closes enough that the cost difference is more than justified by the management improvements. This disk-to-disk-to-tape approach provided for quicker data restore from the backup appliance and leveraged tape for long-term retention.
D2D2T helped solved some of the management challenges of tape – you no longer have to search for tapes to do a restore or worry that you have a complete set. And, because backup appliances used disk to store backups, they also improve restore times – allowing terabyte-sized datasets to be recovered in hours.
Disk-to-disk-to-tape modernized backup a lot compared to previous methods, but it introduced new challenges in managing and dealing with backup appliances – they don’t scale well. When you fill up an appliance you have to buy another one and then when that fills up, another one. But each appliance you add is a new deduplication and management zone, which creates inefficiencies.
And most importantly, while they are faster than tape, most backup appliances are inefficient at restoring data. They are designed to ingest backup data as quickly as possible, but restore performance is secondary. To restore as quickly possible, you should be able to serve data as fast as the primary storage can consume it. But, as the appliance disks fills up, the restore speed can get even slower, so it is also hard to run an efficient system.
|As an aside, nostalgia and promotion in films like Guardians of the Galaxy have created a recent surge in audio tape sales which grew 136% in 2017, but this likely doesn’t foreshadow a wave of skinny jean-wearing hipster IT Admins drinking flat whites while they wait for their data to be slowly restored from tape.|
Flash offers orders of magnitude performance increases over spinning magnetic disk. High performance flash backups and restores can be used to match the speed of all-flash production systems restoring as fast as the production systems can consume the data. And, flash backups can also be used to enable more simultaneous server backups providing better utilization at scale. And by coupling flash with data reduction, we can get great economics AND great restore performance. If you are suffering from missed backup windows or restore SLAs, the solution is flash-to-flash.
Pure Storage® FlashBlade™ is a next-gen flash platform architected for bandwidth, delivering unprecedented performance for a wide range of workloads, including backup and rapid restore. Unlike competitors, FlashBlade restore performance exceeds that of backup. A 75-blade FlashBlade delivers peak backup performance of 90 TB/hr and restore performance that is 3x higher at 75 GB/s, And by the way this is in just 20 rack units.
FlashArray™ snapshots can be directly backed-up to FlashBlade, where they can be persisted and used for rapid recovery leveraging a new snapshot type called a Portable Snapshot where snapshot metadata is encapsulated into the snapshot. Allowing it to live anywhere.
After discovering how and why our customers were using FlashBlade for data protection, we have created Rapid Restore solutions to support all of the key databases, as well as solutions with both the traditional and newer data protection vendors. Rapid Restore isn’t “A” solution – it is a collection of over 10 different backup/recovery use cases and the list keeps growing.
The Rapid Restore solutions for databases are fast. REALLY fast. A single FlashBlade can support 15TB/hr of backup rate and almost 50TB/hr of restore rate. With nearly 3:1 data reduction on an Oracle RMAN backup to FlashBlade, DBAs can complete their database restores in minutes and hours instead of days.
We also have solutions for leveraging FlashBlade as a target for traditional data protection software to deliver improved performance in a smaller footprint. By complementing in-build compression features of FlashBlade with our data protection partner features like de-duplication, replication and cloud tiering users can optimize their backup infrastructures for performance, capacity and resiliency.
But there’s another part of the backup problem. Even though our customers were effectively using FlashBlade to deliver Rapid Restore capabilities, they still needed to store large amounts of data offsite for retention and compliance…which they had continued to do with tape. As previously mentioned, tape is complex and slow…but the real failure of tape is that your data is locked offline somewhere – providing no value for your company.
The answer is to finally replace tape with low-cost cloud object storage, like Amazon S3. This is the new Flash to Flash to Cloud (F2F2C) backup paradigm. To help our customers accelerate their journey to this new backup paradigm, Pure Storage provides customer with a few different solutions. CloudSnap is built-in to Pure Storage FlashArray and provides portable snapshot capability to other FlashArrays, FlashBlade or other NFS devices, or direct to the cloud. Snapshot technology is great, but what about data protection with backup, you ask?
Today, we’re announcing the industry’s first F2F2C platform, Pure Storage ObjectEngine. It’s born from flash and cloud to modernize data protection. Unlike legacy disk-to-disk-to-tape architectures, ObjectEngine delivers rapid recovery, saves money with cloud economics, and enables data reuse for web services like GDPR, analytics, and AI.
The ObjectEngine platform consists of two products, ObjectEngine//A and ObjectEngine Cloud. ObjectEngine//A is an on-premises system that sits seamlessly between applications and a backend object store, transparently performing inline data deduplication and encryption to reduce storage and data transmission costs by up to 97%. When coupled with a FlashBlade scale-out storage system, which already enables rapid restore from its all-flash object storage, recovery time objectives (RTOs) can be further reduced, enabling rapid recovery of data in the event of a disaster. ObjectEngine//A will be generally available in 1Q2019.
The ObjectEngine//A base cluster can deliver up to 25 TB/hr of backup and up to 15 TB/hr of data restore performance using its cloud-native scale-out architecture. ObjectEngine can manage hundreds of petabytes (PBs) of data in a single namespace across on-premises object storage, cloud, and hybrid cloud – eliminating data silos, reducing administration, and improving time-to-innovation for value creating workloads.
ObjectEngine Cloud is a cloud-native software offering that delivers scale-out capabilities in the cloud. With ObjectEngine Cloud, enterprises can take advantage of opportunities for data to be reused for web services. It is expected to become generally available in 2H2019.
If you can get your backup data to the cloud, then you can start to think about how you re-use your data for migration, dev/test, analytics, etc. Since ObjectEngine Cloud is built to run in the cloud, you can leverage the cloud edition of your backup software to recover wholly within the cloud. You will also be able to, in the future, restore your data to Pure’s Cloud Block Store or to your favorite Amazon data service. Turning what used to be cold data that sat on a tape inside a vault someplace into business value leveraging a wide variety of web services.
Check out out how Pure Storage ObjectEngine can help you adopt a modern flash-to-flash-to-cloud backup strategy for cloud-economics-driven data protection with faster recovery and minimal management overhead. Also, check out the Forbes article by Steve McDowell to get his latest thoughts on Pure’s ObjectEngine.