Case Studies

A U.S. Research University Engages Advanced HPC to Store the World’s Largest Cancer Genome Database

A U.S. Research University Engages Advanced HPC to Store the World’s Largest Cancer Genome Database

November 28, 2017

The trailblazing cancer research team benefits from a scalable, high performance storage solution for life sciences data.

Research-intensive institutions often create several terabytes of data each month. In order to optimize a high, scholarly level of research, the data must be highly organized, accessible and secure. Best-in-breed research programs are often awarded government grants to assist with such needs.

Universities require budget-friendly high performance computing solutions to process and manage data efficiently. They are likely to invest in network storage, compute and services across the board. More importantly, they want a partner that understands their research grant and budget cycle and can deliver the best FLOPS-to-dollar ROI—and reliable service and support that they can depend upon without fail.

A renowned public research university in California, recognized for assembling the world’s first working draft of the human genome sequence, has been funded by major grants to further its life sciences work. In 2015 and 2016 alone, the university was awarded nearly $40 million in grants by the National Institutes of Health to support its research projects.

The Challenge

Each genome file, the DNA record from a tumor or normal tissue, equals 300 billion bytes. (For every case there are two of these files, the cancer genome and the normal genome.) The university’s department specializing in cancer genome research needed a scalable, high-performance storage solution to handle its ever-growing data.

With the prospect of deeper sequencing in the future, the team was forced to plan for up to a terabyte for each case. However, it would not be cost-effective to overshoot its storage needs by purchasing, for example, 20 petabytes at once. Adding storage on an “as needed” basis was key to the school’s purchasing decision.

The Solution

Following a review of multiple vendors, the university selected Advanced HPC to store its cancer genome research data. The Advanced HPC team deployed a high-speed, scalable, parallel storage solution, based on the General Parallel File System.

Using the building-block approach, the school started with exactly what the research team needed at the time. Now, whenever they require more performance or capacity or both, they call on Advanced HPC to add the appropriate amount of storage. The solution easily scales with no disruption to service. Additionally, the university receives enterprise-class protection and efficiency with full data lifecycle management from Advanced HPC.

“The university found value in the fact that Advanced HPC can seamlessly deliver more performance or capacity—or both,” said Joe Lipman, Senior Sales Engineer, Advanced HPC.

Today, the university maintains its renowned Genome Browser, a web-based tool that is used extensively in biomedical research and serves as the platform for several large-scale genomics projects. It has rapidly grown to be the largest database of cancer genomes in the world, storing more than 2.5 petabytes of data and serving downloads of nearly 3 petabytes per month. As the central repository for the foundational genome files, the solution has streamlined efforts as data became as easy to obtain as downloading from a hard drive.

“We can scale to infinity with the university as it grows, thanks to the cost-saving, building-block approach they’ve taken, said Lipman. “We’re honored to support the team in its global fight against cancer.”

“We can scale to infinity with the university as it grows, thanks to the cost-saving, building-block approach they’ve taken. We’re honored to support the team in its global fight against cancer.”

Joe Lipman
Senior Sales Engineer

Advanced HPC