Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Protein Generation via Genome-Scale Language Models with Bio-Physical Scoring
DescriptionLarge language models (LLMs) trained on vast biological datasets can learn biological motifs and correlations across the evolutionary landscape of natural proteins. LLMs can then be used for de novo design of novel proteins with specific structures, functions, and physicochemical properties. We employ a pre-trained genome-scale language model that uses codons as tokens and integrate it into a workflow for targeted generation of sequences. Our framework suggests new gene sequences that are ranked for downstream evaluation by metrics that collectively capture extensive sequence-specific, biophysical, and biochemical properties. We demonstrate our integrated workflow to design novel variants of the enzyme, malate dehydrogenase (MDH), that exhibit more favorable activation energies than their natural counterparts (reduction of 4.01 kJ/mol) with sustained sequence generation rates of 10^4/hr and simulation rates of 10^2/hr on 64 nodes of Polaris with about 99.7% system utilization during the run.
Event Type
Workshop
TimeMonday, 13 November 202312:10pm - 12:30pm MST
Location501-502
Tags
Artificial Intelligence/Machine Learning
Registration Categories
W