AI may be more mainstream today than ever before, yet the ROI for many AI projects remains elusive. It’s a thread in a larger digital transformation conversation, where common roadblocks include complexities, talent gaps, and bottlenecks in IT.
Below, we’ll examine the demands of AI projects on IT in particular, and suggest four shifts IT can make to support successful AI projects.
1. Shift Mindsets from AI and DevOps to “MLOps”
It’s interesting how most AI initiatives end up being discussions about humans. On the IT level, it’s no different. AI projects are often a story of two teams, two types of workflows, and two ways of getting things done.
The concept of model debt and friction between teams explains how IT teams typically in charge of building out the infrastructure for AI teams tend to struggle. These aren’t your typical software projects—the data pipelines look a lot different than traditional software data pipelines. They can reside and move between on-premises, in hybrid clouds, or on the edge.
Trying to shoehorn new data initiatives can lead to issues:
DevOps and AL/ML teams are wired differently. Data science teams rely on DevOps teams to “industrialize” their data pipelines, but DevOps can struggle to support them with legacy solutions. Data science teams need data mobility that can overwhelm DevOps’ plates.
Legacy IT infrastructures are too brittle for AI and ML at scale. Traditional IT infrastructures and data storage solutions aren’t well set up to handle AI and ML teams’ requirements. It’s software 1.0 machinery trying to power multicloud environments and software 2.0 initiatives.
Shadow projects lack access to the right resources. If an AI project sits outside of the data center (“shadow AI“), and outside of the greater IT org, it can set them up to struggle with limited access to shared services and resources.
AI/ML data gets locked down or siloed. If data is siloed or can’t be moved quickly enough for experimentation and inferencing between on-prem and the cloud, AI projects can stall out.
To merge the two successfully, a new IT discipline has emerged: “MLOps,” or “AIOps,” a modern mindset that’s all about making the architectural choices AI teams need to thrive.
2. Level-up Compute Power for MLOps
MLOps teams work well when they have access to a modern mix of technologies that make AI’s demands on data feasible, including:
- Faster compute power
- Accelerated networking capabilities
Leveraging GPUs over CPUs can give AI projects the horsepower they need. Integrated hardware and software solutions like the AIRI//S (AI-ready infrastructure) address storage out of the box, without the need to rearchitect the data center. With Pure FlashBlade and NVIDIA DGX GPUs, models that took a week to train can now be trained in 58 minutes.
Take Meta’s RSC, AI Research SuperCluster, for example. It’s one of the world’s fastest AI supercomputers, designed to train next-gen AI models on petabytes of data. The research it does helps Meta learn how to deliver more content that people want to see, but it needed storage that could handle petabytes of data while delivering the performance demanded by the supercomputer. With Pure Storage FlashArray and FlashBlade, Meta can support the supercomputer’s GPU and storage needs while minimizing operational costs.
Why should AI developers care about storage? Here are reasons.
3. Get the Performance of UFFO Storage
Building and managing data pipelines are typically the most costly, challenging aspects of a complete AI/ML solution. With the growth in the amount of unstructured data, difficulty managing complex storage can result in downtime—or languishing data that’s not being leveraged to its full potential.
As AI projects pick up speed, they’ll demand more data at faster rates, and legacy storage that can’t keep up will slow AI applications and insights. (This is on reason why AI developers should care about storage.) Legacy systems that manage compute and storage together will create more complexity as the performance gap between compute and storage continues to widen. Unified fast file and object (UFFO) storage can consolidate data types and boost the performance of data-intensive AI workloads. Pure Storage FlashBlade is ideal for AI and ML with intelligent load balancing and the concurrency required by end-to-end AI workflows.
An example: Intelligence Processing Unit (IPU) inventor Graphcore helps innovators make breakthroughs in machine intelligence. Graphcore’s IPUs are disrupting AI, but the demands of AI compute were disruptive, too—requiring more bandwidth and throughput than legacy storage systems could provide. To avoid bottlenecks for AI and ML workloads, which could potentially limit processing performance, Graphcore incorporated Pure Storage in its end-to-end, AI-optimized technology stack. FlashBlade enables Graphcore to get the most out of its IPU processor, with high bandwidth, throughput, and low latency.
Learn more: MLOps 101: What is AI Infrastructure?
4. Leverage Hybrid and Multicloud Environments
Going multicloud for AI, ML, and deep learning initiatives can give teams agility and “a plethora of choices” to pick and choose cloud services without vendor lock-in.1 One use case: Using on-prem for compliance, speed, and cost savings; and, leveraging AWS Outposts for the control plane so teams don’t have to manage every server, network, and application themselves.
Trifacta notes that “The benefits of the cloud are hard to overestimate in particular as it relates to the ability to quickly scale analytics and AI/ML initiatives.” Their reports reveal that 66% of respondents are running all or most of their AI/ML initiatives in the cloud. And, they can be highly mobile: data pipelines for AI workflows may move between cloud components that are colocated, local, or public, which makes an upgraded data storage solution critical.
Enable AI Success at Scale with Pure
As AI and ML projects proliferate and mature, we’re getting a clearer picture of what works and what doesn’t. These projects are challenging and complex, but there’s immense potential for intelligent tech’s data and services within an organization—not just in the data science sandbox.
CIO and CTOs have a huge opportunity here to ready their organizations with modern, UFFO storage. It just might be key to solving these challenges. We know it can streamline multicloud environments and create a standardized backend that makes moving data around more effortless, but it’s the unparalleled speed and agility that will really future-proof any AI project, no matter what it demands.