FlashBlade® Augments its Fast Object Store with S3 over RDMA for AI/ML Workflows

Summary

FlashBlade Object Store will soon support S3 over RDMA, dramatically improving both performance and cost efficiency for the most demanding AI workloads.

Today’s AI models demand not only more data but also fundamentally different approaches to how that data is stored and accessed. The age of text-only data is now a thing of the past—audio and video are becoming mainstream data types. Because of the rise of multimodal data, object storage is gaining momentum.

Object storage opens the door for both flexibility and scale. With a flat namespace and rich metadata capabilities, object storage allows organizations to store, manage, and analyze data at scale.

In AI/ML environments, storage performance (throughput and latency) is critical for an organization to stay not only competitive but ahead of the pack. Any sort of bottleneck can cost organizations millions in GPU compute time or delay critical model deployments.

AI training workflows require extremely large and continuous data movement from storage systems to GPUs. When these expensive computational engines aren’t fed fast enough, they sit idle, turning million-dollar investments into costly underutilized assets while delaying AI innovation timelines.

This bottleneck is where Remote Direct Memory Access (RDMA) technology becomes game-changing, as it creates direct pathways between storage and GPU memory. We’re excited to announce that Pure Storage® FlashBlade® Object Store will soon support S3 over RDMA specifically optimized for AI environments, dramatically improving both performance and cost efficiency for customers’ most demanding AI workloads.

How RDMA Helps

Remote Direct Memory Access (RDMA) technology significantly improves data transfer efficiency for AI/ML environments.

RDMA provides the following benefits:

Improved throughput: Data is transferred via direct memory access, accelerating data movement over the network.
Reduced latency: RDMA transfers are handled directly at the network card thereby bypassing the kernel and network stack.
Optimized CPU utilization: Data transfer happens directly from the storage to GPU memory (called copy offload), bypassing the CPU bounce buffers.

While fast object storage is ideal for data collection and processing stages and is used as a data lake in many AI environments, the same system can now be used for training and inference stages by enhancing performance using S3 over RDMA.

That’s why we’re so excited about this solution. Preliminary results indicate FlashBlade Object Store with S3 over RDMA can provide throughput of about 250GB/s for a five-chassis system. Note: Pure Storage FlashBlade can go up to 10 chassis in a single namespace.

This approach also introduces simplicity in the architecture by not requiring a hot cache filesystem and allowing the data lake to be accessed directly by the GPUs.

How It Works

Pure Storage has developed a Pure Storage client that runs on the client system that houses the GPUs. This purpose-built client does two things. First, it generates descriptors needed for communication and facilitates the RDMA transfer for data payload. Second, it communicates with the HTTP service on FlashBlade to send the descriptors and S3 metadata.

We’ll use two operational workflows, S3 GET and S3 PUT, to explain how RDMA works.

The purpose of the S3 GET operation over RDMA is to retrieve data from the Object Store by bypassing the standard HTTP stack. The workflow allows for this data retrieval by the following steps:

Memory allocation: At application startup, the Pure Storage client allocates memory and passes the RDMA memory descriptor to the RDMA client.
Request processing: For S3 GET requests, the RDMA descriptor is passed on.
Server detection: The S3 GET request lands at the HTTP server. It detects the RDMA descriptor, performs the metadata operation, and forwards the data part to the RDMA server.
Data transfer: The FlashBlade RDMA server reads the data from the Object Store and writes the data to the memory descriptor allocated in step 1 using the RDMA write request.
Response: FlashBlade responds to the S3 GET request via the HTTP server and network fabric manager to the client.

The workflow for an S3 PUT operation over RDMA allows for writing data to the Object Store. This streamlined data transfer process enhances the system throughput and is outlined using the following steps:

Memory allocation: At application startup, the Pure Storage client allocates memory and passes the RDMA memory descriptor to the RDMA client.
Buffering data: Data is written into the buffer, and the RDMA descriptor is passed on to S3 PUT request.
Server detection: The S3 PUT request is received at the HTTP server. From there, it detects the RDMA descriptor, performs the metadata operation, and forwards the data details to the RDMA server.
Data transfer: The FlashBlade RDMA server gets the data from the client using an RDMA read request and writes it to the Object Store.
Response: The FlashBlade system responds to the S3 PUT request via the HTTP server and network fabric manager to the client.

In this manner, S3 over RDMA bypasses the standard HTTP stack for data transfer for both reads and writes, increasing overall throughput of the system and making data available to GPUs faster.

Conclusion

FlashBlade is trusted by more than 100 customers for their AI workloads. FlashBlade has been validated with proven AI certifications such as NVIDIA DGX SuperPOD and NVIDIA DGX BasePOD and high-performance storage certification for NVIDIA Cloud Partners, as well as turnkey solutions like GenAI Pods and FlashStack® for AI.

Combine the aforementioned benefits of S3 over RDMA with the pervasive applicability, agility, simplicity, and exabyte-scalability of FlashBlade Object Store, and you get an all-purpose AI storage for managing and analyzing large data sets, as well as training and inference.

S3 over RDMA support for FlashBlade is expected to be available later this year. To learn more about this capability, please contact your Pure Storage representative.

Blog Home

Pure Storage FlashBlade Augments Its Fast Object Store with S3 over RDMA for AI/ML Workflows

Summary

How RDMA Helps

How It Works

Conclusion

Real-world Organizations Gaining ROI from AI

How to Protect Oracle Backups from Cyber Threats

Encrypted Replication from FlashArray to Pure Cloud Block Store

Introducing SQL Server 2025: Enterprise-ready AI

How to Create Customised Billing Reports for IT Departments and MSPs with Pure Fusion and AI DevOps

Top Stories

How to Protect Oracle Backups from Cyber Threats

Encrypted Replication from FlashArray to Pure Cloud Block Store

Introducing SQL Server 2025: Enterprise-ready AI

How to Create Customised Billing Reports for IT Departments and MSPs with Pure Fusion and AI DevOps

From Storage to Stream: A Comparison of Leader Election in Portworx, Kafka, and Raft

Pure Storage FlashBlade Augments Its Fast Object Store with S3 over RDMA for AI/ML Workflows

Summary

How RDMA Helps

How It Works

Conclusion

Real-world Organizations Gaining ROI from AI

Related Stories

Top Stories