OSU’s Center for Genome Research and Biocomputing can do more science with EPYC™ processor core and thread counts
What do snow leopards, eucalyptus, Phytophthora fungi, corn, and rice all have in common? They, as well as Oregon State University’s mascot the North American beaver, have had their genomes sequenced at the university’s Center for Genome Research and Biocomputing (CGRB). The sequencing is just the start of conducting science based on genomics, according to Chris Sullivan, Assistant Director of Biocomputing at the CGRB.
The next critical work is genomic sequence alignment, the prerequisite to comparing and analyzing genomes by arranging DNA, RNA, or protein sequences to find similar regions. “We take these small strings of data and we align them to these massive genomes,” said Sullivan.
The CGRB serves 26 departments at Oregon State, whose researchers can access 4,000 to 5,000 programs the center has compiled. These run on a distributed service architecture supporting 5000-plus processors, 5 PB of usable storage, and a secure private 1G/10G/40G network. The CGRB generates 4TB to 8TB of data every day and has thousands of jobs running at any given moment. That job count is the boundary Sullivan keeps pushing to help Oregon State run all scientific procedures as cost effectively as possible.
Most jobs relate to genomic alignment. “We are just pounded by these,” said Sullivan. “We process about 20,000 jobs a day.” One hundred job files, each containing 50 million sequences that must be aligned to a genome and each using a different algorithm, may run at one time. This work requires very high processor core and thread counts, which Sullivan could only find on “breathtakingly expensive” servers. Now he has an alternative.
Science Driven By Threads
“The return for us in research is really in thread counts,” Sullivan said. “Until the AMD EPYC™ processor, I didn’t have a piece of equipment that could actually come close to my IBM Power8 and Power9 equipment in terms of threading and the number of jobs we’re talking about. We were looking for a really high thread count that got us into the over-100-thread range because we have applications that go into that space, while also maintaining affordability. Until the AMD EPYC™ processor, I didn’t have a piece of equipment that could actually come close to my most expensive equipment in terms of threading and the number of jobs we’re talking about.”
“We’re answering scientific questions. Processor frequency doesn’t change the scientific answer, so why would I spend more money on it? But, the more threads I have, the more jobs I’m getting out there. A bigger scope lessens the bias, which means we can actually get much closer to the answers,” said Sullivan.
Sullivan’s strategy is to add multiple powerful AMD EPYC-based machines to get higher thread counts at lower total operating costs. “We felt like we were getting a really good return for the dollars spent,” he said. “The number of jobs we can do is where we win.”
“There are so many different ways we are leveraging EPYC™ processors’ high thread count,” observed Sullivan. “I have groups that are collapsing off of machines that were 48 thread, 24 core hyperthreaded boxes, and they are moving three of them off into one AMD EPYC™ 7601 Processor.”
Power To Do More Science While Cutting Costs
The core density and thread count of EPYC™ processors are also are important in other ways. “I can’t reinvent my server room,” said Sullivan, that would cost “millions and millions” of dollars. Instead, EPYC™ processors give him new options. “The power return on the two-socket we have with the EPYC™ processors is phenomenal,” he said. “We plug those in, get the thread counts the way we want them, get the speed we need to get through the jobs quickly enough and we’re not changing our server room in any way,” Sullivan said.
Sullivan explained that university groups also reduce their management fees with CGRB while doubling the number of jobs they can perform. “We are no longer running InfiniBand because of AMD EPYC. They can run locally, faster than they would ever do with InfiniBand.” As a bonus, moving off bigger machines to more cost-effective equipment without compromising performance is impressive on grant applications.
Sullivan looks for new technology that can deliver faster results for his researchers. “I put the technology in front of the researchers, and they beat the stuffing out of it. They’re the ones that make the decisions on what they buy. And AMD EPYC™ processors are what they are buying because of the price and return on the amount of threads.”
Pushing Boundaries of Science With Great Partners
Sullivan writes algorithms for various projects at OSU and said there can never be enough power. “Advanced HPC takes me into a space where they know the people who are manufacturing the motherboard. They know the people who will put on the bleeding edge technology,” Sullivan said. “That’s important to how we change science.”
“The science is trying to reach past the equipment,” he said. That’s why Sullivan turned to Advanced HPC and AMD to help implement the next generation of computing at Oregon State’s CGRB.
Sullivan said AMD will be a key part of the center’s continuing scientific endeavors. “AMD has always been here for us, and we are looking forward to deploying more AMD EPYC™ processors because of their cost effectiveness in delivering those increased thread counts.”