It’s an interesting time for big data analytics. We’re approaching the end of the Hadoop hype cycle, facing a new (hybrid/multi) cloud reality, and looking to accelerate the promise of AI from the lab to production.
Our recent announcement of the Vertica in Eon Mode for Pure Storage® is a great example of that renewed focus.
Now, let’s dig into each trend a little further and look at the characteristics a storage platform needs to successfully enable a modern analytics experience.
Three Trends Driving Big Data
Depending on the level of data hygiene, many organizations are unable to see the value from Hadoop as more than a distributed file store. Combine that data with the massive volume of data in cloud-object stores, and you find yourself with a lot of data and a lot of silos. What’s missing is an efficient, effective way to unify all that data.
Data lakes are traditionally built on a DAS-based infrastructure. However, the approach Hadoop took when it entered the market was primarily bound by the limits of networking or storage technologies, like 1GB Internet and slower spinning disk.
But today, these barriers don’t exist and there are three key drivers shaping the new big data analytics market.
Think about it: Would you want to add an engine to the train every time you added another box car? Probably not.
Pure Storage FlashBlade™ is uniquely architected to allow you to achieve superior resource utilization for compute and storage. At the same time, it reduces the complexity created by the siloed nature of the older big data solutions.
The concept of the public cloud came with a lot of promises. In reality, many organizations that shifted to a cloud-first strategy have found hidden costs and unplanned complexity which now hamper their ability to deliver the results that their business needs.
Cost and complexity tend to be primary challenges. However, security also remains a barrier to adoption. Also, IT departments shoulder a significant burden meeting mandatory high-availability and data-protection requirements.
- The importance of simplicity and ease of use. Consider the system as complex if you need a bunch of storage experts to run the system.
- The significance of the consumption model. Organizations demand the ability to pay for what they need when they need it, as well as the ability to seamlessly grow the environment over time in a non-disruptive manner. A lot of vendors try to solve this challenge with finance programs.
However, when you need to move to next-gen hardware, our competitors’ finance programs don’t address the pain of a forklift upgrade. To scale non-destructively over long periods of time—at least five to ten years—you need to make a crucial architectural decision at the outset.
#3 Data-science and ML projects don’t get the support required to move from experiments to production.
Machine learning needs a ton of data for accuracy, and there is just too much data to retrieve for every training job. At the same time, predictive analytics without accuracy won’t deliver the business advantage that you’re seeking.
You can visualize data analytics as it is traditionally deployed on a continuum with data warehousing on one end and AI on the other end. But the way this manifests in most environments is in a series of silos. Data is duplicated across a myriad of bespoke analytics, AI environments, and infrastructure. This creates an expensive and complex environment.
Historically, there was no other way. Some level of performance is always table stakes—and each data pipeline element has a unique workload profile. For a single platform to deliver on multi-dimensional performance requires a diverse set of applications—and that didn’t exist even three years ago.
That’s exactly what FlashBlade is built to handle: small files, large files, high throughput, and low latency while achieving petabyte scale in a single namespace. Pure is solving for the modern data experience that you’re experiencing. At the end of the day, it’s all about creating a valuable experience for your teams and your organization.
FlashBlade and Vertica Deliver
All-flash performance. A cloud economic model for data storage. AI-ready capabilities. Today, Pure Storage and Vertica are providing the essential requirements for modern analytics to a wide range of organizations. For example:
- A SaaS analytics company uses Vertica on FlashBlade to authenticate the quality of digital media in real-time
- A multinational car company uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars
- A healthcare organization uses Vertica on FlashBlade to enable providers to make real-time decisions that impact lives
When it comes to better platform options, a modern architecture must address the diverse performance requirements of the continuum and allow you to bring the model to the data instead of creating separate silos.
Learn more about Pure Storage and Vertica