It’s an interesting time for big data analytics. After the end of the Hadoop hype cycle, we embraced a new (hybrid/multi) cloud reality, and began accelerating the promise of AI from the lab to production. Accomplishing this required a renewed focus on infrastructure—in particular, storage infrastructure—as a critical enabler for enterprise-scale modern analytics.
Vertica in Eon Mode for Pure Storage® is a great example of the characteristics a storage platform needs to successfully enable a modern analytics experience. Let’s dig a little deeper.
Three Trends Driving Big Data
The Hadoop hype cycle left a lot of data in Hadoop Distributed File System (HDFS) data lakes and reservoirs. Unfortunately, many organizations find themselves with lakes that have turned into swamps. Data lakes are traditionally built on a DAS-based infrastructure. However, the approach Hadoop took when it entered the market was primarily bound by the limits of networking or storage technologies, like 1GB Internet and slower spinning disk.
Depending on the level of data hygiene, many organizations are unable to see the value from Hadoop as more than a distributed file store. Combine that data with the massive volume of data in cloud-object stores, and you find yourself with a lot of data and a lot of silos. What’s missing is an efficient, effective way to unify all that data.
Today, these barriers don’t exist. There are three key drivers shaping the new big data analytics market.
#1 All-flash storage has transformed how organizations access, manage, and leverage data.
The performance increases afforded by all-flash storage has largely reduced the need for local data storage for significant volumes of data. At the same time, you can achieve superior economies of scale with the segregation of compute and storage. But compute and storage don’t always scale in lock step.
Think about it: Would you want to add an engine to the train every time you added another box car? Probably not.
Pure Storage FlashBlade™ is uniquely architected to allow you to achieve superior resource utilization for compute and storage. At the same time, it reduces the complexity created by the siloed nature of the older big data solutions.
#2 The new “cloud reality” has shifted expectations about cloud economics.
The concept of the public cloud came with a lot of promises. In reality, many organizations that shifted to a cloud-first strategy have found hidden costs and unplanned complexity which now hamper their ability to deliver the results that their business needs.
Cost is a primary challenge. However, security also remains a barrier to adoption. IT departments shoulder a significant burden meeting mandatory high-availability and data-protection requirements. Often, the solution to more robust security necessitates a significant re-architecting of the IT ecosystem.
In many cases, DBAs realize they’ve built applications to leverage object-storage platforms like S3. They require object protocols, but now they might also require ultra-fast performance.
The implications of this new reality are two-fold.
- The importance of simplicity and ease of use. Consider the system as complex if you need a bunch of storage experts to run the system.
- The significance of the consumption model. Organizations demand the ability to pay for what they need when they need it, as well as the ability to seamlessly grow the environment over time in a non-disruptive manner. A lot of vendors try to solve this challenge with finance programs.
However, when you need to move to next-gen hardware, our competitors’ finance programs don’t address the pain of a forklift upgrade. To scale non-destructively over long periods of time—at least five to ten years—you need to make a crucial architectural decision at the outset.
To bring the benefit of a Storage-as-a-Service (STaaS) model to modern analytics storage, we offer Evergreen//One™.
#3 Data-science and ML projects don’t get the support required to move from experiments to production.
Machine learning needs a ton of data for accuracy, and there is just too much data to retrieve for every training job. At the same time, predictive analytics without accuracy won’t deliver the business advantage that you’re seeking.
This trend highlights the need to bring machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms.
You can visualize data analytics as it is traditionally deployed on a continuum with data warehousing on one end and AI on the other end. But the way this manifests in most environments is in a series of silos. Data is duplicated across a myriad of bespoke analytics, AI environments, and infrastructure. This creates an expensive and complex environment.
Historically, there was no other way. Some level of performance is always table stakes—and each data pipeline element has a unique workload profile. For a single platform to deliver on multi-dimensional performance requires a diverse set of applications—and that didn’t exist even three years ago.
That’s why application vendors pointed you toward bespoke DAS environments, for example. Today, we see a move toward disaggregation of compute and storage. That’s exactly what FlashBlade is built to handle: small files, large files, high throughput, and low latency while achieving petabyte scale in a single namespace. Pure is solving for the modern data experience that you’re experiencing. At the end of the day, it’s all about creating a valuable experience for your teams and your organization.
FlashBlade and Vertica Deliver
All-flash performance. A cloud economic model for data storage. AI-ready capabilities. Today, Pure Storage and Vertica are providing the essential requirements for modern analytics to a wide range of organizations. For example:
- A SaaS analytics company uses Vertica on FlashBlade to authenticate the quality of digital media in real-time
- A multinational car company uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars
- A healthcare organization uses Vertica on FlashBlade to enable providers to make real-time decisions that impact lives
When it comes to better platform options, a modern architecture must address the diverse performance requirements of the continuum and allow you to bring the model to the data instead of creating separate silos.
Learn more about Pure Storage and Vertica
Written By: