What are the key takeaways from this Axial Podcast episode?

Key insights include: **Inverse Protein Design:** The core unsolved challenge in computational protein engineering is the inverse folding problem — given a desired protein structure and function, determine which amino acid sequence produces it. Current biophysical force fields like AMBER and CHARMM carry significant uncertainty, meaning even powerful search algorithms succeed only a fraction of the time, far below the 20–40% hit rate that would make wet lab screening practical.; **Negative Data Generation:** Training machine learning models on protein function requires balanced datasets of both working and non-working variants, yet most published datasets contain only positive results. Maranas advocates for a moonshot-style initiative: systematically engineer hundreds of enzymes spanning diverse EC classifications across prokaryotic, eukaryotic, and archaeal organisms, generating unbiased positive and negative variant data to properly train protein language models.; **Mathematical Reframing Over Raw Compute:** When computational limits block biological design problems, reframing them using established mathematical structures — such as mixed-integer linear programming borrowed from airline scheduling and warehouse logistics — can unlock solutions. Maranas used this approach to design microbial strains requiring up to 10 simultaneous gene knockouts, a result once considered impossible that CRISPR now executes in an afternoon.

What did Costas Maranas discuss on Axial Podcast?

Penn State chemical engineering professor Costas Maranas discusses how computational methods — specifically optimization algorithms, biophysical force fields, and emerging transformer models — can engineer proteins, enzymes, and microbial strains to perform functions nature never evolved them to do, and why data quality remains the central bottleneck. Key topics include: **Inverse Protein Design:** The core unsolved challenge in computational protein engineering is the inverse folding problem — given a desired protein structure and function, determine which amino acid sequence produces it. Current biophysical force fields like AMBER and CHARMM carry significant uncertainty, meaning even powerful search algorithms succeed only a fraction of the time, far below the 20–40% hit rate that would make wet lab screening practical.; **Negative Data Generation:** Training machine learning models on protein function requires balanced datasets of both working and non-working variants, yet most published datasets contain only positive results. Maranas advocates for a moonshot-style initiative: systematically engineer hundreds of enzymes spanning diverse EC classifications across prokaryotic, eukaryotic, and archaeal organisms, generating unbiased positive and negative variant data to properly train protein language models..

How long is this episode of Axial Podcast?

This episode is 49 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Axial Podcast

Computational Protein Design with Costas Maranas

March 22, 2025

49 min episode · 2 min read

Costas Maranas

Episode

49 min

Read time

2 min

Topics

Productivity, Relationships, Design & UX

AI-Generated Summary

Published Mar 14, 2026

Key Takeaways

✓Inverse Protein Design: The core unsolved challenge in computational protein engineering is the inverse folding problem — given a desired protein structure and function, determine which amino acid sequence produces it. Current biophysical force fields like AMBER and CHARMM carry significant uncertainty, meaning even powerful search algorithms succeed only a fraction of the time, far below the 20–40% hit rate that would make wet lab screening practical.
✓Negative Data Generation: Training machine learning models on protein function requires balanced datasets of both working and non-working variants, yet most published datasets contain only positive results. Maranas advocates for a moonshot-style initiative: systematically engineer hundreds of enzymes spanning diverse EC classifications across prokaryotic, eukaryotic, and archaeal organisms, generating unbiased positive and negative variant data to properly train protein language models.
✓Mathematical Reframing Over Raw Compute: When computational limits block biological design problems, reframing them using established mathematical structures — such as mixed-integer linear programming borrowed from airline scheduling and warehouse logistics — can unlock solutions. Maranas used this approach to design microbial strains requiring up to 10 simultaneous gene knockouts, a result once considered impossible that CRISPR now executes in an afternoon.
✓Computational-Experimental Collaboration Protocol: Productive wet lab partnerships require computational researchers to become genuine domain experts in the experimental partner's organism and methods, not vice versa. Maranas estimates it takes multiple back-and-forth cycles — where computational suggestions are rejected, models are updated, and new suggestions are made — before a collaboration becomes reliably productive. Selecting collaborators for personal compatibility, not just scientific overlap, is equally critical.
✓Top-Down Genome Streamlining: Rather than building minimal cells from scratch, a pragmatic near-term strategy is stripping 10–20% of dispensable DNA from proven production strains like E. coli or yeast. Removing non-functional genomic segments reduces replication burden and eliminates metabolic pathways that could accidentally activate and divert carbon flux away from the target product, improving both predictability and yield in bioreactor deployments.

What It Covers

Penn State chemical engineering professor Costas Maranas discusses how computational methods — specifically optimization algorithms, biophysical force fields, and emerging transformer models — can engineer proteins, enzymes, and microbial strains to perform functions nature never evolved them to do, and why data quality remains the central bottleneck.

Key Questions Answered

•Inverse Protein Design: The core unsolved challenge in computational protein engineering is the inverse folding problem — given a desired protein structure and function, determine which amino acid sequence produces it. Current biophysical force fields like AMBER and CHARMM carry significant uncertainty, meaning even powerful search algorithms succeed only a fraction of the time, far below the 20–40% hit rate that would make wet lab screening practical.
•Negative Data Generation: Training machine learning models on protein function requires balanced datasets of both working and non-working variants, yet most published datasets contain only positive results. Maranas advocates for a moonshot-style initiative: systematically engineer hundreds of enzymes spanning diverse EC classifications across prokaryotic, eukaryotic, and archaeal organisms, generating unbiased positive and negative variant data to properly train protein language models.
•Mathematical Reframing Over Raw Compute: When computational limits block biological design problems, reframing them using established mathematical structures — such as mixed-integer linear programming borrowed from airline scheduling and warehouse logistics — can unlock solutions. Maranas used this approach to design microbial strains requiring up to 10 simultaneous gene knockouts, a result once considered impossible that CRISPR now executes in an afternoon.
•Computational-Experimental Collaboration Protocol: Productive wet lab partnerships require computational researchers to become genuine domain experts in the experimental partner's organism and methods, not vice versa. Maranas estimates it takes multiple back-and-forth cycles — where computational suggestions are rejected, models are updated, and new suggestions are made — before a collaboration becomes reliably productive. Selecting collaborators for personal compatibility, not just scientific overlap, is equally critical.
•Top-Down Genome Streamlining: Rather than building minimal cells from scratch, a pragmatic near-term strategy is stripping 10–20% of dispensable DNA from proven production strains like E. coli or yeast. Removing non-functional genomic segments reduces replication burden and eliminates metabolic pathways that could accidentally activate and divert carbon flux away from the target product, improving both predictability and yield in bioreactor deployments.

Notable Moment

Maranas describes attending an operations research conference in the late 1990s where genome assembly researchers presented to a room of mathematicians who were entirely disengaged. Recognizing that his cross-disciplinary background uniquely positioned him to bridge that gap became the moment he committed to redirecting his entire lab toward computational biology.

Know someone who'd find this useful?

You just read a 3-minute summary of a 46-minute episode.

Get Axial Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links.

Tools

AMBER
“Current biophysical force fields like AMBER and CHARMM carry significant uncertainty, meaning even powerful search algorithms succeed only a fraction of the time.”
CHARMM
“Current biophysical force fields like AMBER and CHARMM carry significant uncertainty, meaning even powerful search algorithms succeed only a fraction of the time.”

Similar Episodes

Related episodes from other podcasts

Latent Space

Mar 24

Explore Related Topics

⚡Productivity 💕Relationships 🎨Design & UX

This podcast is featured in Best Biotech Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Axial Podcast.

Every Monday, we deliver AI summaries of the latest episodes from Axial Podcast and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Computational Protein Design with Costas Maranas

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Modern Computational Tools for Chemistry with Corin Wagen

🔬Why There Is No "AlphaFold for Materials" — AI for Materials Discovery with Heather Kulik

Evolutionary Intelligence and Biologics Discovery with Jeremy Agresti

Foundation Models for Structured Data

Books, tools, and gear mentioned in this episode

Tools

More from Axial Podcast

Modern Computational Tools for Chemistry with Corin Wagen

Evolutionary Intelligence and Biologics Discovery with Jeremy Agresti

AI Workflows for Biopharma with Alex Telford

AI Legal Software with Scott Stevenson

Scaling Proteomics with Milad Dagher

Similar Episodes

🔬Why There Is No "AlphaFold for Materials" — AI for Materials Discovery with Heather Kulik

Foundation Models for Structured Data

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

Master Self Control & Overcome Procrastination | Dr. Kentaro Fujita

Mariana Mazzucato Thinks We Need More Moonshots

Explore Related Topics

You're clearly into Axial Podcast.