Non-Volatile Memory Express (NVMe) is becoming the primary protocol for interconnecting modern storage technologies both within storage arrays and between storage arrays, storage networks, and...
In a previous blog post (An Inside Look At Big Data Predictive Analytics with Pure1 – Part 1), we provided an overview of the large data collection and predictive support infrastructure that Pure Storage has built up. Today, we’ll take a deeper look inside this vast collection of data to learn some interesting and useful things about storage.
A good place to start is to look at the IO input across the fleet of Pure FlashArrays deployed across thousands of customers.
Reads vs Writes:
Let’s start with the simplest question first. Are customers using the FlashArray more for reads or writes? Is it 50/50? There are some who fall into each of these categories. There are FlashArrays that are 100% reads and there are FlashArrays that are 100% writes though such extremes are very rare. The vast majority of FlashArrays run a mix which favors reads over writes.
The median distribution is ⅔ reads and ⅓ writes, with 78% of customer arrays running workloads that have more reads than writes. Figure 1 shows the read / write distribution across all customer Pure FlashArrays.
Figure 1. The distribution of reads / writes across all customer Pure FlashArrays. The y-axis shows the percentage of each. The x-axis represents all the customer arrays sorted from those that do the most reads to those that do the most writes.
We learned that Pure customers tend to do more reads than writes. What’s the most common size at which they do these reads and writes?
Let’s start with a very direct approach by looking at the distribution of IOs for every customer FlashArray based on the different size buckets. Then let’s average these distributions for all the customers. This way we learn what on-average are the most common IO sizes across all customers. This is shown in Figure 2.
Figure 2. The average of all customer array’s IO size distribution.
We see that the two most popular sized buckets for reads are 8KB to 16KB and 4KB to 8KB respectively. For writes the most popular size is 4KB to 8KB.
By looking at the distribution of IO sizes on an individual array basis we see that the majority of arrays are dominated by IOPS which are small in size:
Majority of IOs < 32KB
|Majority of IOs < 16KB|
|73% of arrays||
56% of arrays
|Writes||93% of arrays||
88% of arrays
However, looking at IOPS (IOs per second) is only half the equation. Equally important is to look at throughput (bytes per second) – how data is actually delivered to the arrays in support of real world application performance.
We can look at IO Size as a weight attached to the IO. An IO of size 64KB will have a weight 8 times higher than an IO of size 8KB since it will move 8 times as many bytes. We can then take a weighted average of the IO size of each customer FlashArray. This is shown in Figure 3 below.
Figure 3. The weighted-average IO size of all customer arrays.
Looking at the weighted average tells a very different story than looking at raw IOPS. We can see that the most popular read size is between 32KB and 64KB and the most popular write size is between 16KB and 32KB. In fact the majority of our customers’ arrays have weighted IO sizes above 32KB rather than below 32KB.
|Majority of IOs => 32KB||Majority of IOs => 16KB|
|71% of arrays||
93% of arrays
|Writes||30% of arrays||
79% of arrays
So how do we reconcile these two different views of the world? IOPS tells us that that most IOs are small. But weighted IOPS (throughput) disagrees.
To better understand what is really going on let’s plot the IOPS and the throughput on the same graph. Figures 4 and 5 shows the distribution of IOPS by IO size and the distribution of throughput by IO size across all our customers’ FlashArrays globally. Figure 4 is writes and Figure 5 is reads.
Figure 4. Distribution of write IO sizes and write throughput sizes across all customer arrays.
Figure 5. Distribution of read IO sizes and read throughput sizes across all customer arrays.
What we observe is that for both reads and writes the IOPS are dominated by small IOs while the majority of the actual payload is read and written with large IOs.
On average 79% of all writes are less than 16KB in size but 74% of all data is written with writes that are greater than 64KB in size. This hopefully sheds some light on how IO size distributions are different when talking about IOPS vs throughput.
IO Size Modalities:
So far we have been looking at IO sizes averaged across all FlashArrays. Now let’s get away from this aggregation and see if we can spot any patterns in the kinds of workloads that customers are running.
It turns out there are four typical IO size distributions or profiles that are most common, as shown in Figure 6 below. There is the unimodal where one bucket dominates all others, there is bimodal with IOs falling into two buckets, there is trimodal with IOs falling into three buckets, and there is multimodal where IOs spread out fairly nicely across most buckets.
Figure 6. Examples of the four most common IO size distributions that we observe in customer FlashArrays.
Unimodal is the most prevalent distribution followed by multimodal then bimodal and trimodal. Keep in mind that even a unimodal distribution can represent more than one distinct application running on the array that happen to mostly use the same block size.
Pulling all this together:
We presented a lot of different data above. We saw that typically, reads dominate writes. We saw that typical IO sizes depend on how you look at it. From a strictly IOPS perspective smaller sizes dominate but from a throughput perspective larger sizes do. We also saw that more than half of customer arrays are running a workload with more than one dominant IO size. So how do we pull all this together and make some practical sense of it?
First, we need to look at the Pure FlashArray in light of the fact that the world is moving towards consolidation. It is an uncommon case that a customer purchases an array to run just a single application on it. This isn’t just conjecture. As part of our Pure1 analytics we apply a predictive algorithm to try to identify which workload is running on each volume of a Pure FlashArray. We have extensively validated the accuracy of this model by cross-checking the predictions with customers. We’ve found that 69% of all customer arrays run at least two distinct applications (think VDI and SQL for example). Just over 25% of customer FlashArrays run at least three distinct workloads (think VDI + VSI + SQL for example). The mixture of different applications on a single FlashArray as well as variability within the applications themselves create the complex IO Size picture that we see above.
It’s nice to see this validated in practice, but this was the core assumption we made 7+ years ago when we started building FlashArray’s Purity Operating Environment software. Unlike most flash vendors who focused on single-app acceleration, we focused on building affordable flash focused on application virtualization and consolidation from day one, and thus one of the main design considerations of Purity was a variable block size and metadata architecture. Unlike competitors who break every IO into fixed chunks for analysis (for example XtremIO at 8K) or who require their deduplication to be tuned to a fixed block size or turned off to save performance per Volume (for example Nimble), Pure’s data services are designed to work seamlessly without tuning, for all block sizes, assuming that EVERY array is constantly doing mixed IO (and it turns out that even single-app arrays do tons of mixed-size IO too!). If your vendor is asking you to input, tune, or even think about your application’s IO size, take that as a warning – in the cloud model of IT you don’t get to control or even ask about what workloads on top of your storage service look like.
Okay, so Pure FlashArrays are deployed in consolidation environments leading to a complex IO size distribution, but what does that mean from a practical benchmarking perspective? The best way to simulate such complexity accurately is to try running the real workloads that you intend to run in production when evaluating a new storage array. Don’t pick one block size, nor two, nor three etc. – any of these narrow approaches miss the mark for real-world applications with multiple IO size modalities, especially in ever more consolidated environments. Instead, actually try out your real life workloads – this is the only way to truly capture what’s important in your environment. Testing with real world IO size mixes has been something that we have been passionate about for years. See prior posts on this topic at the end of the blog.
All that said, if that kind of testing is not possible or practical then try to understand the IO size distributions of your workload and test those IO size modalities. And if a single IO size is a requirement, say for quick rule-of-thumb comparison purposes, then we believe 32KB is a pretty reasonable number to use, as it is a logical convergence of the weighted IO size distribution of all customer arrays. As an added bonus it gets everyone out of the “4K benchmark” thinking that the storage industry has historically propagated for marketing purposes.
Hopefully this blog sheds some light on how to think about performance in a data-reducing all-flash array, as well as how we are using big data analytics in Pure1 to not only understand customer environments, but also to better design our products. If you’d like to learn more about Pure1, please check out our products page.
Some examples of previous blogs on the topic of benchmarking: