SAP Data Hub has already gone through different iterations in its short lifespan and expsanded beyond just a set of ETL and EIM tools. It is now “a data orchestration and management solution running on Kubernetes, leveraging open source and embedding machine learning capabilities.” And when it comes to the best infrastructure to run SAP Data Hub on, Pure Storage FlashBlade® has some clear advantages.

SAP Data Hub

SAP Data Hub helps enterprises become more intelligent in many different scenarios. In every single scenario, FlashBlade empowers SAP Data Hub to accomplish it better.

Let’s examine 4e scenarios and see how FlashBlade is fundamentally built to support them:

1. Image identification

The first step of building any image identification system is to train the model using large datasets. This includes image processing, image extraction and processing, and eventually connecting this information to the SAP ERP system. SAP Data Hub will connect and monitor these components. It needs the proper hardware to do this quickly as small file performance of the storage tier, including images, text and audio, becomes critical. If the storage tier does not handle small files well, SAP Data Hub will require more data refinement and its results would not be as efficient. Extra steps will be required to process data and performance will be affected.

Retail is a great use case for this. Returns can be easily automated as customers take images of their items and get them automatically exchanged. The ability to randomly read small files (50KB) at 10GB/s from a single FlashBlade chassis (50GB/s with 75 blades) means no extra effort is required to aggregate individual data points to make larger, storage-friendly files.

2. Data Lakes

We all heard how data lakes can quickly become data swamps if not built and managed properly. One of SAP Data Hub’s core missions is to tackle this problem. It also happens to be something our FlashBlade team is obsessed with. Many data centers have data lakes and data warehouses for analytics, running on dedicated silos of storage. Typical data lake and data warehouse infrastructure brim with complexities. Each application or use case will have a separate warehouse, each copying data back and forth from the data lake. For large enterprises with complex environments this becomes a nightmare to use, and for IT it’s a nightmare to manage.

FlashBlade is the industry’s first data platform engineered for a wide range of workloads from instant restore to AI to software dev and more. It is not only built for unstructured data, but any type of unstructured data. By definition, unstructured data means unpredictable data- data can take any form, size, shape, and can be accessed in any pattern. FlashBlade can accelerate any data, small or large, random or sequential. By having all these different types of data live and be processed on the same infrastructure SAP Data Hub is on, it will make its job exponentially easier to build and faster to process.

An easy use case here is in the financial industry. Using SAP Data Hub on FlashBlade, financial institutions can build fraud detection systems by building modern data lakes of social media streams on FlashBlade. SAP Data Hub can then tap into these data streams and integrate them into transactional data to S/4HANA. This of course relies on SAP Vora which is now a part of SAP Data Hub.

3. Predictive capabilities

We believe predictive capabilities can improve business processes so much that we built our own customer service and support using such skills. You can read about it here. SAP Data Hub on FlashBlade can automate business processes by continuously monitor and gather sensor data from IoT devices and eventually flag and integrate any special data back in the ERP system. An interesting use case here is the automotive industry. As automobile manufacturers facilities become automated, robots can continuously report back on their results, so humans can find more efficient ways and opportunities to shave seconds of their processes and improve the quality of their products.


The last use case I’d like to discuss might be the most boring, yet it could be the most common out there. As customers utilize FlashBlade for different use cases. It ultimately becomes “an all sorts of” data warehouse. SAP Data Hub takes advantage of this data by building workflows and orchestrate what type of data should go to what system on FlashBlade. In the airline business, RFIDs are tracked and gathered at incredible speeds and volume, but unfortunately many of these tagged items continue to get lost. SAP Data Hub can find these tagged items stored on a FlashBlade and automatically trigger emails to customers found in the ERP system.

I hope these use cases paint a picture of how the best of hardware and software can come together to offer your enterprise a multi-purpose solution, knocking out several strategic goals of your intelligent enterprise in one punch.