Oregon State University’s Center for Genome Research and Biocomputing (CGRB) is a much-valued customer of both Advanced HPC and one of our top partners, AMD – a globally renowned high-performance computing, graphics and visualization technologies company. According to the CGRB website, the “CGRB Biocomputing and Bioinformatics facility provides wide-ranging resources, expertise and support for computational needs of the molecular biosciences community at Oregon State University.” The CGRB features a robust, expansive infrastructure consisting of a distributed service architecture, a greater than 5000-processor computer cluster and a secure private 1G/10G/40G network. While each machine has internal hard drive disk space, the CGRB also affords connection to over 3.5PB of NFS shared disk space.
Our most recent case study details CGRB’s advancements in leading-edge genome sequencing and genomic sequence alignment; endeavors that are imperative for comparing and analyzing genomes by arranging DNA, RNA, or protein sequences to find similar regions.
Serving 26 departments at Oregon State, each day the CGRB generates 4TB to 8TB of data produced by some 20,000 projects, most of which are genomic alignment related and many of which run concurrently. To give a sense of scope, 100 job files, each containing 50 million sequences that must be aligned to a genome and each using a different algorithm, may run at the same time. The task of shepherding all these projects and ensuring that they are run as cost effectively as possible rests on the shoulders of Chris Sullivan, Assistant Director of Biocomputing at the CGRB. The especially high processor core and thread counts that efforts like DNA analysis, genotyping, genomic testing and DNA sequencing consistently demand has heretofore generally necessitated extremely expensive servers.
As Sullivan attests, the AMD EPYC™ (EPYC) processor is changing all that.
“The return for us in research is really in thread counts,” Sullivan said. “We were looking for a really high thread count that got us into the over-100-thread range because we have applications that go into that space, while also maintaining affordability. Until the AMD EPYC processor, I didn’t have a piece of equipment that could actually come close to my most expensive equipment in terms of threading and the number of jobs we’re talking about.”
Beyond providing notable spec advantages per CPU – including 32 cores/64 threads; 8 memory channels per socket; 2TB of DDR4 RAM; and 128 PCIe 3.0 lanes – EPYC has demonstrated high performance and substantial value, posting very competitive integer and floating-point performance. These benchmarks suggest, according to a June, 2018 article in Forbes, that “EPYC delivers strong value supporting a variety of workloads that run the modern enterprise.”
This balance of performance and cost-efficiency is exceedingly important to groups like CGRB where, as Sullivan proclaimed, “The science is trying to reach past the equipment.”
One of the reasons that Advanced HPC has forged a lasting partnership with OSU and the CGRB is Advanced HPC’s capacity to provide Sullivan with boundary-pushing tools.
“Advanced HPC takes me into a space where they know the people who are manufacturing the motherboard. They know the people who will put on the bleeding edge technology,” Sullivan said. “That’s important to how we change science.”
The “science” versus “equipment” dynamic that Sullivan referenced is a thought-provoking commentary on the incessantly accelerating pace of exploration in genomics and the latest pursuits in everything from gene expression to whole-genome sequencing. Driven researchers are not willing to sit idle if they simply perceive a gap between technology viability and scientific need.
Consider, for example, the exponential gains made in Next Generation Sequencing (NGS). With its immense parallel sequencing and ultra-high throughput capacity, NGS is not only dramatically changing the use cases for research, but moreover transforming healthcare itself by bridging personalized medicine with genomic profiling. The work of companies on the vanguard of sequencing technology like Illumina, Thermo Fisher Scientific and BGI Genomics is validated by the fact that the U.S. Food and Drug Administration (FDA) approved 16 new personalized medicine therapies in 2017 alone, a record number of personalized medicine approvals in a single year.
The CGRB offers high throughput sequencing on the Illumina HiSeq 3000 and Illumina MiSeq instruments.
2017 also saw approval of three gene therapies; the first authorization for health-related genetic tests directly to consumers (i.e., 23andMe); and the first approval of a personalized medicine biosimilar (i.e., a biologic that is “similar” to another biologic drug already approved by the FDA). Of note, Illumina is so committed to furthering the pace of genome research that it founded the Illumina Accelerator in 2014 to provide “genomics-driven startups with capital and access to capital, sequencing and genomics expertise, coaching, and lab and office space in the San Francisco Bay Area.”
This extraordinary momentum should not be impeded by technology that cannot keep pace. That’s why EPYC has become a staple of Sullivan and the CGRB team. EPYC provides higher thread counts at lower total operating costs making it the type of efficient workhorse that industrious research groups covet.
“I put the technology in front of the researchers, and they beat the stuffing out of it,” said Sullivan “They’re the ones that make the decisions on what they buy. And AMD EPYC processors are what they are buying because of the price and return on the amount of threads.”
And to learn even more about EPYC, be sure to check out this video.