Development of Genomic Language Models to Predict Optimal Genomes for Commercial Protein Production

Members: Triplebar, University of California – Berkeley

Project dates: 2026 – present

This project will create a tool that will fundamentally accelerate strain optimization for the production of resilient and cost-effective proteins capable of wound healing, advanced nutrition, chemical defense, or other defense-relevant compounds. Triplebar and UC Berkeley will collaborate to develop first-of-its-kind predictive AI models for protein production.  

The project will unite the ultra-throughput data generation capabilities of Triplebar’s platform with UC Berkeley’s AI for biology expertise to create predictive models for strain engineering. Together, researchers will create genome-to-phenotype models to predict optimal genomes for commercial protein production. This tool will allow for predictions of protein expression and secretion levels from genomic sequences, as well as recommend genomic edits to achieve the desired level of protein production. 

Despite considerable advances in capabilities to read and write DNA, the ability to cost-effectively engineer organisms to behave in ways that enable low-cost biomanufacturing remains a significant challenge.  The duration and cost of these development projects are also often hard to predict, injecting risk into the biomanufacturing product development process. 

This project will improve the industry’s ability to engineer organisms to make needed commercial and defense products through bioindustrial manufacturing. 

Funding source: U.S. Department of Defense