Data Protection Challenges The growth of data in recent years has been astounding. Modernization of database platforms and analytics along with big data has made data the primary asset ...
Almost three years ago, a group of Puritans were discussing the state of the All-Flash Array market. We were, of course, bullish on all-flash, but if you think back to 2013, the market was much more “uncertain” on the entire concept – with most large storage vendors still very focused on making hybrid flash/disk arrays work. But at the time – two things were clear to us:
So, as a relatively small fast-growing startup at the time, we made a huge bet: to create a startup-within-a-startup that we called “Iridium” (the original code name for Pure Storage was “Osmium” – one element lower on the periodic table). We asked Pure co-founder and Chief Architect John Hayes to go off (well, down the street anyway), and recruit an entirely new team and build a new company to go after this big idea. We wanted to re-create all the urgency, excitement, and passion of a true ground-floor startup, and do it in a way that allowed Pure to continue our 110 percent focus on ramping and innovating FlashArray, while in parallel taking-on a completely new product that was equally or even more ambitious. And so FlashBlade was born.
The Need for FlashBlade
Much has been written about the strategic nature of data – so I won’t repeat it here. But suffice it to say, we saw a huge revolution over the past five years in the “Big Data” movement – in which organizations of all sizes and from varied industries were learning to take better-advantage of data in high volume, velocity and variety. But there seemed to be two opposing forces at work: first – big data was just getting bigger and bigger. Businesses everywhere were going digital, and new movements like the Internet of Things and born-digital science, engineering and media were pushing the data growth curve exponentially, not to mention the advancing digital security threat landscape – which requires real-time detection and correlation.
Meanwhile – the majority of folks looking at how to capture, store and analyze this bigger and bigger data were doing so with large-scale commodity spinning disk, and slow, batch-style analytics. These forces seemed at odds to us – there was a non-stop thirst for faster answers, real-time analytics, richer queries, and interactive simulations, yet because of the sheer data size everyone had come to the conclusion that it must have to be done on disk or after the fact using batch oriented jobs.
And so we had our (LED) light bulb moment: what if we could build an all-flash platform that could tackle the growing data challenges of the next decade? What if we could fundamentally change what was possible with data: how much could be stored, how fast it was accessed, how it was queried, and what types of insights could be gleaned, even in real-time? And what if we could do it affordably enough that it could even cost less than many of the disk-based systems that were in-use today?
The FlashBlade Design Principles
Given the above mission, the team got to work. We gave the team complete freedom: they were free to take FlashArray, and all of it’s Purity Operating Environment code, or they were free to take none of it. All that mattered was that they built a purpose-built product for this space, and were true to that design spec. The team prototyped many approaches, but it became clear to them that simply going “all-flash” was only a small part of the answer. Building this platform was not only about IO performance, but also metadata performance, massive scale, and economics. It became clear that they had to re-think everything.
The FlashBlade design thus centers around a few core principles:
1 – Elastic Scale-out. The design was built to scale effortlessly and linearly from small deployments to very big deployments, with great care to ensure that every dimension of the system scaled elegantly and linearly. IO performance, bandwidth, metadata performance, NV-RAM, protocols, user connections…everything had to scale with the system as it grew.
2 – High Performance, Low Cost. The team needed to design a system that optimized for two seemingly opposing goals – game-changing performance coupled with a cost profile that would make all-flash affordable for even very big workloads. This lead the team down the path of designing very unique custom hardware, with a particular focus on minimalism. Three things mattered in the hardware: Intel CPUs, flash, and Ethernet. The goal was to have as much of these three elements and as little of anything else as possible – to drive down cost and drive-up performance and simplicity.
3 – Natively multi-protocol. The team realized that we were on the cusp of some very big changes in the unstructured and semi-structured data storage space – and that legacy file access protocols were giving way to newer object protocols, and even newer applications-specific protocols. Unlike legacy systems where protocols were often “stacked” upon one another, the team designed a system with a core object store at the center, and an easily ability to add scale-out protocols on top of that. The system will start with both NFS and Object/S3 access, and will expand as we work with customers to understand the next-generation protocols that matter to them.
4 – Pure Simplicity. As we learned with our first product, FlashArray, simplicity can be transformative, and is perhaps our biggest driver of repeat purchase. Customers in large-scale file deployments were just drowning in volumes, cluster-pairs, aggregates, flash caches and other such nonsense, and at the high-end of the market, were sucked into open source science projects. We wanted to design a product so simple that anyone could manage it, at any scale, whether you were a developer, data scientist, engineer or storage admin.
Or to put it more simply – we want to build something that was Big, Fast, and Simple.
And so, after three years of development, we’re proud to introduce you to Pure’s second act: FlashBlade.
FlashBlade is an ambitious product, built from the ground-up to meet the needs of tomorrow’s big + fast unstructured data universe. FlashBlade is really four products in one: it’s a scale-out compute and flash system, it’s a huge, massively-parallel SSD based upon raw flash with all flash-management software written by Pure, it’s a scale-out file system and object store, and it’s a software-defined network. Let’s look closer at FlashBlade’s components:
FlashBlade’s scaling unit is the Blade. Each blade marries raw NAND flash (either 8TBs or 52TBs in the first generation) with an Intel Xeon system-on-a-chip processor, a programmable processor with integrated ARM cores, DRAM and integrated NV-RAM, all connected on the blade via PCIe. The design of the blade is sheer minimalism: we worked hard to remove as much unnecessary componentry as possible, to increase density, lower cost, and drive simplicity in the architecture. Blades run the ElasticityTM software distributed across both processors across all blades in a system, and communicate with one another over the Elastic Fabric – more on both below.
The core of any good all-flash array is software, and FlashBlade is no different. The Elasticity software spans all the way from file system to flash, and implements many layers of functionality that would be separate code in other systems. Elasticity includes scale-out file/object system software, a core clustered storage system with both advanced data and resiliency features, and all the software you’d find inside a typical SSD, however optimized globally instead of running individually in 1,000s of SSDs in a large system.
Elasticity takes a good amount of code and inspiration from Purity (FlashArray’s software), but Elasticity was implemented from scratch for the unique scale-out characteristics of FlashBlade. Elasticity is built on the foundation of the Elastic Core, a base object store that implements the basic CRUD primitives (Create, Read, Update, Delete) and sophisticated data services, including always-on data reduction software and encryption (with snapshots and replication coming in follow-on releases). It also includes a complete resiliency layer that includes N+2 Erasure Coding to protect against blade and flash chip loss, High Availability and LDPC error coding to correct for flash bit errors.
Scale-out protocols are built on top of the Elastic Core, and the initial GA version will include NFS v3 and Object/S3. The system is built to enable additional traditional file and object protocols to be added quickly, including future application-specific protocols as they are invented. Unlike competitive systems, all these layers have common metadata and garbage collection for global efficiency.
Metadata is a first-class citizen in FlashBlade. The Elastic Map is the extensible, variable-block metadata engine that is at the core of FlashBlade. It spans from file system-to-flash, and metadata performance scales-out as the system grows, just like every component of the system.
Finally – FlashBlade is from Pure, so it is managed from Pure1TM. Existing Pure customers will feel right at home, with a compatible GUI, CLI, and REST interface, and new customers won’t believe how simple managing storage at petabyte-scale can be.
FlashBlade is built upon an embedded 40 Gb/s software-defined switch fabric, the Elastic Fabric. This low-latency switching fabric connects Blades, Chassis, and 10,000s of clients together on one converged fabric, and different classes of traffic are separated and managed automatically by QoS.
Unlike other systems which went the path of exotic NVMe networking, FlashBlade is betting on the performance and ubiquity of Ethernet. The Elastic Fabric is implemented in a set of redundant Fabric Modules which slide into the back of the 4U chassis, and thus can be upgraded to faster Ethernet speeds over time (the chassis mid-plane is over-provisioned by 10X the bandwidth that is currently used by the system for years of expandability).
It’s also worth noting that the Elastic Fabric runs TCP/IP connections to hosts, but internal communication is done via proprietary protocols which deliver low-latency communication between blades. In fact – the control processors of each blade can communicate directly with the flash on every other blade to make data and metadata updates as necessary.
Adding it All Up
FlashBlade comes together to deliver some amazing specs. We’re cautious to mention here that FlashBlade is currently in its early access phase, and so specs won’t be finalized until GA, but the summary below shows what we’re currently observing in the Beta. We expect these specs to expand after GA as we qualify larger and larger systems in follow-on releases.
FlashBlade is BIG. 1.6 PBs of usable storage (at an assumed 3-to-1 data reduction rate and including all overhead for resiliency, flash management, and metadata). Existing Pure customers might wonder why we’re quoting 3-to-1 data reduction instead of Pure’s normal average of 5.5-to-1. FlashBlade’s targeted unstructured data use cases we believe are a bit less reducible, so we’re being cautious. We also anticipate widespread use of FlashBlade even for non-reducible workloads, given FlashBlade’s affordability. FlashBlade is also efficient – consuming about 1,300 Watts/PB, about the same as a home hair dryer. All this at effective costs of well under $1/GB usable.
FlashBlade is FAST. It delivers up to 15 GB/s of bandwidth per 4U chassis, which scales as you add blades and chassis. A two-chassis system delivers over 1M NFS operations, and we’ve focused on optimizing performance of both metadata and IO, across a wide range of IO sizes.
FlashBlade is SIMPLE. The integrated appliance-like design makes FlashBlade simple to deploy. The scale-out design makes it simple to expand, when adding a blade new capacity and performance are both available instantly – no performance-intensive rebalancing to wait for. Pure1 makes FlashBlade simple to manage and support – a built-in GUI, and a full REST API and CLI for automation.
Get Started with FlashBlade
The FlashBlade Beta is available today as part of our Early Acess Program, and we’ll be shipping a Directed Availability release of FlashBlade in the 2H of this year fully-supported for production workloads.
There’s a lot to say and learn about FlashBlade, but the best way to learn is in person – and we’d love to talk. If you’d like to learn more about FlashBlade, or be part of our Early Adopter Program – please sign-up here.
FlashBlade is an all-flash data platform for everyone. At <$1/GB usable, and starting at deployments of <100TBs, we hope customers of all shapes and sizes unlock exciting new innovations, creations, and discoveries with FlashBlade. We can’t wait to see what you build with it.