Fellow on COVID-19 Bell Prize-Winning Team
A team including a Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient has won a top high-performance computing (HPC) prize for a model that uses genome data to categorize COVID-19 variants and other pandemic-causing viruses.
Danilo Perez Jr. a third-year fellow studying neural science at New York University, shared the Association for Computing Machinery Gordon Bell Special Prizefor HPC-Based COVID-19 Research. The award was announced November 17 at the SC22 supercomputing conference in Dallas.
The international 34-member team’s project, GenSLMs: Genome-Scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics, adapted large language models (LLMs) for use on genetic data. LLMs are artificial intelligence algorithms that typically are used for text generation, customer service chat boxes, translation and other purposes. The programs are trained on millions of words and text snippets, enabling them to predict succeeding words in a partial sentence or paragraph or even to compose text of a specific style.
An ACM releasesays the researchers adapted LLMs and trained them on more than 110 million gene sequences. They tuned a specific model of SARS-CoV-2, the virus that causes COVID-19, on 1.5 million genomes and demonstrated that GenSLMs can accurately and rapidly identify potentially dangerous viral variants.
Perez joined the team while on his 2022 DOE CSGF practicum at Argonne National Laboratory, where he worked with Arvind Ramanathan. The team used Argonne’s Polaris supercomputer, plus NVIDIA Corporation’s Selene and the Perlmutter system at DOE’s National Energy Research Scientific Computing Center. It also ran on the CS-2 system from Cerebras, an artificial intelligence company.
Besides Argonne, authors are affiliated with the California Institute of Technology, Harvard University, Northern Illinois University, the Technical University of Munich, the University of Chicago, the University of Illinois Chicago, Cerebras and NVIDIA.