Genomic sequencing is revolutionizing scientific research. Having data architecture that enables high performance computing to drive fast genomic workflows is critical.

Genomic sequencing is revolutionizing scientific research across biotechnology, agriculture, and especially medicine, where genomic analyses at scale can drive personalized medicine and diagnostics. 

The first human genome sequencing cost about $300 million. Today, technological advances have brought that number down to about $600. As a result, there’s an explosion in the growth of genomic data that must be stored and made ready for analysis at a growing number of healthcare and life sciences organizations. Estimates say that genomics could generate as much as 40 exabytes of data per year by 2025, a number that is likely to grow since the use of AI in genomics is just getting started. 

Genomics teams need data infrastructure that is tuned for high volumes and near real-time speeds. They also need data architecture that enables efficient access to the data by CPUs for a wide range of applications and use cases—from biological research to pharmaceutical development to analysis for predictive diagnosis by doctors in the field.

In Support of the World’s Most Important Research

The Australian Genome Research Facility (AGRF) is at the center of this revolution, providing critical genomics data services to global researchers and clinicians in the biomedical, clinical, agricultural, and environmental fields. Since AGRF’s genomic analyses support time-sensitive research projects, fast and accurate data is paramount to the success of these projects, especially for research teams fighting diseases like COVID-19 or children’s cancer.

However, as the facility’s data needs grew, AGRF found its legacy disk-based storage systems unable to meet client needs for real-time genomics data. In addition, high storage latency and low bandwidth limited adoption of new projects. 

Faster Analyses Fuel Faster Breakthroughs

Recognizing the need for speed, AGRF replaced its legacy disk storage with Pure Storage® FlashBlade®.  In doing so, AGRF cut its pre-analysis times from 18 hours to just 3 hours. Checksum processes, which ensure the quality of data during genomics analytics, now take 23 minutes instead of 10 hours. 

Taken together, these faster analysis and processing times helped AGRF to cut clinical genome sequencing times by as much as 86%—going from 28 days to just 10 days, and as little as four days in urgent cases.

With improved bandwidth come improved economies of scale. AGRF can now take on more groundbreaking genomics projects concurrently. Its services can also now be part of personalized medical services and life-saving healthcare testing. Most importantly, AGRF has data infrastructure that’s ready to support its future and the data growth sure to come with it.

Run Your Entire Genomics Pipeline on a Single Platform

AGRF accomplished its research goals with FlashBlade. Now, other genomics projects can tap into the power of the expanded FlashBlade family, including FlashBlade//S™, which is purpose-built for high-performance computing. In particular, FlashBlade//S can deliver consistently fast and streamlined performance at every stage in a genomic workflow. 

With the power of FlashBlade, life sciences organizations gain the advantage of:

  • A single storage platform that handles both small and large files
  • Petabytes of throughput and fast access to hundreds of millions of files
  • Up to 24x secondary faster analyses compared to traditional disk-based environments—proven ability to assemble genomes within hours on a single dynamic platform built for parallel processing
  • Data integrity and time savings by eliminating copy operations across storage environments and hence removing the chance of copy errors
  • Freedom from downtime
  • Peace of mind with immutable snapshots of data backups

How Pure Accelerates Genomics Research Breakthroughs

Pure Storage is a trusted provider of data storage for scientific research organizations worldwide, including universities, scientific institutes, contract research organizations, and biotechnology companies. Designed for the world’s most challenging computing needs, Pure systems simplify operations by allowing an entire genomics pipeline to be run through a single platform. Pure delivers the performance needed to compress sequencing turnaround, power new AI applications, and bring closer the use of genomics in everyday medicine.