🔬Beyond AlphaFold: How Boltz is Open-Sourcing the Future of Drug Discovery
Episode
81 min
Read time
2 min
Topics
Career Growth, Remote Work, Design & UX
AI-Generated Summary
Key Takeaways
- ✓Training Under Constraints: Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch. They used Department of Energy cluster resources with two-day training windows followed by week-long queue waits, eventually completing training with help from Genesis compute resources.
- ✓Coevolution as Structural Hints: AlphaFold models decode evolutionary patterns where amino acid positions that mutate together across species indicate spatial proximity in 3D structure. This coevolutionary data acts like database lookup, providing strong priors that guide models to approximate solution spaces before physics-based refinement finds low-energy states. Models struggle without this evolutionary signal on novel proteins.
- ✓Generative Modeling Over Regression: AlphaFold3 shifted from predicting single structures to sampling posterior distributions of possible conformations. This generative approach handles uncertainty better than regression, which averages conflicting predictions into incorrect structures. The architecture uses diffusion models with cubic computational complexity from pairwise operations, requiring fewer parameters but more compute than language models.
- ✓Validation Through Distributed Testing: Boltz coordinated 25 academic and industry labs to test designs across diverse applications, reporting results from 8-10 labs in their paper. For nanobodies targeting 14 novel proteins with no known interactions in training data, they achieved nanomolar binders on two-thirds of targets using just 15 designs per target, demonstrating true generalization beyond training distribution.
- ✓Atomic-Level Sequence Prediction: BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements. This unified supervision signal scales better than separate discrete and continuous objectives.
What It Covers
Gabriela Corso and Jeremy Volven from Boltz explain how they open-sourced protein structure prediction after AlphaFold3 remained proprietary. They trained their model once with limited compute, fixing bugs mid-training, and built BoltzLab to democratize drug discovery through accessible AI tools for designing proteins and small molecules that bind therapeutic targets.
Key Questions Answered
- •Training Under Constraints: Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch. They used Department of Energy cluster resources with two-day training windows followed by week-long queue waits, eventually completing training with help from Genesis compute resources.
- •Coevolution as Structural Hints: AlphaFold models decode evolutionary patterns where amino acid positions that mutate together across species indicate spatial proximity in 3D structure. This coevolutionary data acts like database lookup, providing strong priors that guide models to approximate solution spaces before physics-based refinement finds low-energy states. Models struggle without this evolutionary signal on novel proteins.
- •Generative Modeling Over Regression: AlphaFold3 shifted from predicting single structures to sampling posterior distributions of possible conformations. This generative approach handles uncertainty better than regression, which averages conflicting predictions into incorrect structures. The architecture uses diffusion models with cubic computational complexity from pairwise operations, requiring fewer parameters but more compute than language models.
- •Validation Through Distributed Testing: Boltz coordinated 25 academic and industry labs to test designs across diverse applications, reporting results from 8-10 labs in their paper. For nanobodies targeting 14 novel proteins with no known interactions in training data, they achieved nanomolar binders on two-thirds of targets using just 15 designs per target, demonstrating true generalization beyond training distribution.
- •Atomic-Level Sequence Prediction: BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements. This unified supervision signal scales better than separate discrete and continuous objectives.
- •Infrastructure Cost Advantage: Running Boltz models on their platform costs significantly less than self-hosting open-source versions. Their small molecule screening pipeline runs 10x faster than open-source implementations through optimization. Platform users can parallelize 100,000 candidate designs across GPU fleets, completing in minutes what would take weeks serially, amortizing compute costs across customers.
Notable Moment
The team revealed their flagship model went through an unrepeatable training curriculum because they could only afford one training run. While the model trained, they discovered and fixed bugs on the fly, stopping and restarting training multiple times without returning to the beginning. This improvised approach somehow produced a working model that matched AlphaFold3 performance despite the chaotic development process.
You just read a 3-minute summary of a 78-minute episode.
Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Latent Space
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Jun 4 · 75 min
Odd Lots
Why Susquehanna Is Building a Prediction Markets Business
Jun 6
More from Latent Space
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
Jun 3 · 93 min
The Long Run with Luke Timmerman
Ep202: Becky Pferdehirt on Reimagining Science for the AI Era
Jun 2
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
by DeepMind
“Gabriela Corso and Jeremy Volven from Boltz explain how they open-sourced protein structure prediction after AlphaFold3 remained proprietary.”
- BoltzLabBy guest
by Boltz
“Gabriela Corso and Jeremy Volven from Boltz explain how they open-sourced protein structure prediction after AlphaFold3 remained proprietary...and built BoltzLab to democratize drug discovery through accessible AI tools for designing proteins and small molecules that bind therapeutic targets.”
- Boltz-1By guest
by Boltz
“Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch.”
- BoltzGenBy guest
by Boltz
“BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements.”
More from Latent Space
We summarize every new episode. Want them in your inbox?
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build
GitHub's plan for Agents — Kyle Daigle, GitHub
Why Video Agent models are next — Ethan He, xAI Grok Imagine
Similar Episodes
Related episodes from other podcasts
Odd Lots
Jun 6
Why Susquehanna Is Building a Prediction Markets Business
The Long Run with Luke Timmerman
Jun 2
Ep202: Becky Pferdehirt on Reimagining Science for the AI Era
Deep Questions with Cal Newport
May 28
Did AI Just “Solve” Math? (Let’s Take a Closer Look) | AI Reality Check
NVIDIA AI Podcast
May 27
Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300
Practical AI
May 21
Hermes Agent: Agents that grow with you
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into Latent Space.
Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime