X-rays, MRIs, CTs, ultrasounds, microscopy, and digital pathology: The world of medical imaging has many modalities routinely in use. Diagnostics and clinical research rely on high-resolution imaging, which means data volumes can easily reach the multi-petabyte level.
Organizations usually store this data on picture-archival and communication systems (PACS) or Vendor Neutral Archive systems (VNA). Unfortunately, the servers don’t have advanced imaging analytics capabilities. Doctors and researchers have to study images manually to make diagnostics decisions. Once teams complete the immediate analysis and diagnoses, images are archived for many years, sometimes even decades. The archives are slow to access. Data is locked away and can’t be reaccessed for advanced image processing or population-based machine learning. These deep archives, called “dark data,” often become a lost opportunity.
As healthcare and life sciences teams look to become agile with their data, the ability to automate medical imaging analytics and deploy machine learning algorithms on imaging data is critical. As a result, IT leaders at these organizations are reconsidering their data infrastructure—both storage and compute—as they rethink how to make the most of their imaging data at hand.
Many of these organizations are turning to XNAT, an open-source imaging-informatics platform that helps import, archive, process, and securely distribute imaging data. Unlike PACS, XNAT has increased support for machine learning and annotation workflows. That means researchers and physicians can extend their capabilities when diagnosing diseases based on radiology.
At Pure, we’ve proven our value in the PACS community, and we’re now further boosting medical imaging research using XNAT. I sat down with our in-house storage experts: Ravi Poddar and Brian Cook. Their backgrounds in high-performance computing come together to tackle a most critical problem: Speeding the end-user experience as imaging file size and analytics requirements become more complex.
Devika Garg: How would you define XNAT?
Ravi Poddar: I would call XNAT a research PACS system. So what it’s used for is basically ingesting data from a clinical PACS, anonymizing that data, and then having the data available for downstream research processes.
Devika: XNAT is deployed in a lot of academic research and healthcare organizations. What are some of the XNAT workflow bottlenecks resulting from how they’re currently deployed?
Ravi: I think one of the big bottlenecks we see in the field is actually that the ingestion from the clinical PACS into XNAT can be very slow. One of the main reasons for this is that clinical studies on the imaging side tend to vary in their data set. Traditional HPC storage systems are not optimized for this. For example, many data sets tend to be lots and lots of files from an imaging standpoint. If you look at MRI or CT, that’s thousands of files that are less than a megabyte in size and make up a single study. And just getting that data over from a traditional PACS and into XNAT has turned out to be quite a problem.
Brian Cook: Another potential bottleneck is just the learning curve in the tuning necessary to start bringing in these different file types from other imaging modalities because research environments typically need to be tuned for specific data sets. For example, GPFS—a prevalent file system in HPC—must be tuned for specific types and is not geared for random workloads. What XNAT brings to the table is the ability to bring in varied data sets, but they still need tuning, and the people running the system don’t necessarily have the time or the understanding to tune it properly. So there’s a lack of simplicity, and that creates a bottleneck from a deployment perspective.
Devika: How is Pure addressing these issues around slow data ingestion into XNAT?
Ravi: What we’ve seen in healthcare and life sciences and actually well beyond that industry is that Pure FlashBlade® tends to have the ability to be self-tuning for different types of IO. So, whether there are lots of small files, metadata, or large files, it performs really well basically across the entire spectrum. And so when FlashBlade has gotten installed into customers who have XNAT, it’s able to perform really well and automatically adjust to that data quickly, no matter what it consists of. That’s just inherent to the design of FlashBlade, which makes it a pretty unique product in this particular area.
Devika: I believe we’ve done some testing to support that, right? Could you talk about that?
Ravi: Absolutely. We’ve been partnering with Radiologics, which provides the entire pipeline from XNAT to downstream AI applications and model deployment. Radiologics produced two data sets for us to run tests: One is what they call the empty DICOM data set. This particular data set uses just DICOM headers, which is basically just metadata about patients and no actual data because they believe that the biggest bottleneck is just ingesting large numbers of small files.
So we compared our results on FlashBlade versus what they have in their cloud setup, which includes EBS volumes in AWS backed by SSDs. Compared to that, running FlashBlade in a non-optimized configuration turned out to be 13x faster than AWS backed by flash. In fact, at a particular customer which we have, which is a big cancer research center, we were told that they were running on GPFS with real data. This customer was only able to achieve on the order of 10MB/s throughput, but with FlashBlade, we’ve already achieved over 1GB/s in our labs. So that’s well over 10x faster and actually closer to 100x performance.
Devika: Brian, you were closely involved in that implementation. Do you want to talk a little about the customer story?
Brian: For the customer Ravi mentioned, the team managing the clinical PACS environment was assigned to make the XNAT system work and they were interfacing with their research HPC team. They found that the complexities of HPC and GPFS were slowing down the ingest time significantly. Because of the complexities, the team wasn’t able to figure out why it was so slow. They had been working on it for months, and the researchers were falling behind on their timelines.
But by looking at this from a holistic perspective and seeing how FlashBlade could help, they were able to move away from GPFS. Their workflow became much simpler to manage, and as Ravi mentioned, FlashBlade is very fast and open to many different file types. Just by making those changes, they were able to significantly improve the ingest performance of the system and not have to re-engineer the entire solution with GPFS every time they changed file types. So now they’re able to bring in different modalities without having to manage that complexity.
Devika: What would you say to current and potential users of XNAT?
Brian: The thing about XNAT is that you deploy it in a more modern cloud-native application format: They use containers and have their Linux-based environment which is easy to manage and has more open source components. So when organizations are looking at XNAT and they’re coming from traditional high-performance computing environments, that thinking doesn’t necessarily apply. XNAT is a more modern architecture and what Pure does is enable modern cloud-native architectures to perform really, really well and XNAT on Pure sees the same benefits. So when organizations are looking at XNAT, it would be useful not to look at it as an extension of their HPC environment. This next-generation type of software design just works better on the kind of modern infrastructure that Pure provides.
Devika: A lot of current and potential XNAT users already use PACS for their imaging data. What would you say to them?
Ravi: We would actually recommend they put both PACS and XNAT on the same system because I think one of the benefits of FlashBlade is workload consolidation. And, since we support a myriad of protocols, we can recommend putting both on the same system so you can have fast performance on PACS as well as XNAT at the same time. So maybe you’re not using XNAT right now, you’re not doing much clinical research, but you might in the future, and by having FlashBlade back your imaging infrastructure in general, you’re going to be covered.
Brian: The interesting part is that XNAT has changed the perspective of PACS from the IT side. PACS has historically been an archiving solution where you store images and basically forget about them for 20 some years until you are allowed to delete them. But what XNAT is showing is that there is value in reusing that archived medical imaging data both for doing research and for doing better analysis with patients. So high-performance PACS use for researchers has not typically been a requirement, but XNAT is showing that this will be a requirement in the future. So moving in that direction early on helps people solve those problems that are just now starting to creep up.
Written By:
Explore Data Solutions for Your Industry
Join Pure//Accelerate After Hours for industry-specific content available starting on June 15.