Ever seen this storage message?
Damn, I hate that warning. I swear my phone reminds me every single day that it’s almost out of storage. And I have to uninstall applications and stuff before my phone lets me take a new photo.
Why do we keep running up against the phone storage limit? And why do I keep buying new phones to get out of this problem??
And it doesn’t help that, as phone cameras get fancier, you’re adding more storage requirements to your phone. That “shot on an iPhone” billboard? It totally plays the storage card: how big do you think a hi-res photo is? You’d better buy a larger phone so you can take more than 10 hi-res pics without your phone making you delete Candy Crush.
Welcome to the joys of Direct Attached Storage!
Yes, there are benefits to this integrated format – like the fact that you don’t have to be cabled up to a storage device while taking mountain-top selfies. And when you buy a new phone (which will require both storage & compute), you only need to buy a single device. Somebody created a one-stop package for you.
So, if you’re cool with the ratio of storage to compute, this model of storage-glued-to-compute is great for you!
And if you only use your iPad for placating your kid with TV on airplanes, you can choose to get a beefy iPad with high storage ratio to fit thousands of Dora episodes.
And if you use a MacBook for running business apps at work, you can get one with a higher ratio of CPU cores to power that compute.
If your family’s anything like mine, y’all could basically be an ad for Apple® with all the products (read: storage+compute units) you’ve gotten.
Now imagine trying to put all your devices to work together in some kind of unified orchestration.
How even?!? Some kind of cabled up network between them with some kind of load balancing system? And how do you monitor the whole big picture as one unit? And how do we even save a backup of the big picture? (remember that each of your devices backs up to iCloud individually)
What happens if you had to run your MacBook’s business apps on your iPhone? Can you imagine opening your 1 million line excel doc on your phone? Yikes.
That’s exactly what’s going on in a Hadoop cluster that’s been built up over time with mixed storage and compute ratios.
Today, businesses running Hadoop commonly have a collection of SW applications that each need their own storage-to-compute ratio (e.g. Spark, Hive, Elasticsearch), so infrastructure is built out as silos to support each app based on its needs – like separate Apple products.
Combining these application silos into a single analytics pipeline is the hard part because, even if you get it perfectly configured and tuned for your query today, you have massive pain associated with any changes, like:
- if one application needs more compute
- if you need to add a new app that just got invented
- if you have more people investigating the data == more queries running and taxing the pipeline
So how do you build your way out of “Storage Almost Full”?
Question: What if there was a way to easily add more processing power to your iPhone without sacrificing any of the benefits of its simplicity and ease of use? If upper-stack SW applications just need to manage compute with access to shared storage access, then why don’t we split compute from storage and have a single storage target across all the user applications?
Answer: Because there was NO storage platform both fast and simple enough to abstract away all the cross-silo management.
A single storage platform that can keep all pieces of the pipeline fed.
A single set of data to manage, backup, monitor, and replicate.
A way to take endless panoramic pics on your iPhone 6 – and let your MacBook edit them in real time.
Did you know that your analytics apps don’t just speak HDFS – but NFS & S3, too? When you point them at a FlashBlade array (NFS & S3), it just works.
You will have more data, more users, more apps, more questions, all of which mean more pipeline adjustments. Let your next infrastructure overhaul be your last.
To get a technical overview of Direct Attached Storage in a modern analytics pipeline, read this blog from one of our engineers.