🔬Beyond AlphaFold: How Boltz is Open-Sourcing the Future of Drug Discovery
Episode
81 min
Read time
2 min
Topics
Science & Discovery
AI-Generated Summary
Key Takeaways
- ✓Training Under Constraints: Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch. They used Department of Energy cluster resources with two-day training windows followed by week-long queue waits, eventually completing training with help from Genesis compute resources.
- ✓Coevolution as Structural Hints: AlphaFold models decode evolutionary patterns where amino acid positions that mutate together across species indicate spatial proximity in 3D structure. This coevolutionary data acts like database lookup, providing strong priors that guide models to approximate solution spaces before physics-based refinement finds low-energy states. Models struggle without this evolutionary signal on novel proteins.
- ✓Generative Modeling Over Regression: AlphaFold3 shifted from predicting single structures to sampling posterior distributions of possible conformations. This generative approach handles uncertainty better than regression, which averages conflicting predictions into incorrect structures. The architecture uses diffusion models with cubic computational complexity from pairwise operations, requiring fewer parameters but more compute than language models.
- ✓Validation Through Distributed Testing: Boltz coordinated 25 academic and industry labs to test designs across diverse applications, reporting results from 8-10 labs in their paper. For nanobodies targeting 14 novel proteins with no known interactions in training data, they achieved nanomolar binders on two-thirds of targets using just 15 designs per target, demonstrating true generalization beyond training distribution.
- ✓Atomic-Level Sequence Prediction: BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements. This unified supervision signal scales better than separate discrete and continuous objectives.
What It Covers
Gabriela Corso and Jeremy Volven from Boltz explain how they open-sourced protein structure prediction after AlphaFold3 remained proprietary. They trained their model once with limited compute, fixing bugs mid-training, and built BoltzLab to democratize drug discovery through accessible AI tools for designing proteins and small molecules that bind therapeutic targets.
Key Questions Answered
- •Training Under Constraints: Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch. They used Department of Energy cluster resources with two-day training windows followed by week-long queue waits, eventually completing training with help from Genesis compute resources.
- •Coevolution as Structural Hints: AlphaFold models decode evolutionary patterns where amino acid positions that mutate together across species indicate spatial proximity in 3D structure. This coevolutionary data acts like database lookup, providing strong priors that guide models to approximate solution spaces before physics-based refinement finds low-energy states. Models struggle without this evolutionary signal on novel proteins.
- •Generative Modeling Over Regression: AlphaFold3 shifted from predicting single structures to sampling posterior distributions of possible conformations. This generative approach handles uncertainty better than regression, which averages conflicting predictions into incorrect structures. The architecture uses diffusion models with cubic computational complexity from pairwise operations, requiring fewer parameters but more compute than language models.
- •Validation Through Distributed Testing: Boltz coordinated 25 academic and industry labs to test designs across diverse applications, reporting results from 8-10 labs in their paper. For nanobodies targeting 14 novel proteins with no known interactions in training data, they achieved nanomolar binders on two-thirds of targets using just 15 designs per target, demonstrating true generalization beyond training distribution.
- •Atomic-Level Sequence Prediction: BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements. This unified supervision signal scales better than separate discrete and continuous objectives.
- •Infrastructure Cost Advantage: Running Boltz models on their platform costs significantly less than self-hosting open-source versions. Their small molecule screening pipeline runs 10x faster than open-source implementations through optimization. Platform users can parallelize 100,000 candidate designs across GPU fleets, completing in minutes what would take weeks serially, amortizing compute costs across customers.
Notable Moment
The team revealed their flagship model went through an unrepeatable training curriculum because they could only afford one training run. While the model trained, they discovered and fixed bugs on the fly, stopping and restarting training multiple times without returning to the beginning. This improvised approach somehow produced a working model that matched AlphaFold3 performance despite the chaotic development process.
You just read a 3-minute summary of a 78-minute episode.
Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Latent Space
AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Apr 23 · 54 min
Masters of Scale
Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers
Apr 25
More from Latent Space
Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO
Apr 22 · 72 min
The Futur
Why Process is Better Than AI w/ Scott Clum | Ep 430
Apr 25
More from Latent Space
We summarize every new episode. Want them in your inbox?
AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO
🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik
Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion
Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony
Similar Episodes
Related episodes from other podcasts
Masters of Scale
Apr 25
Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers
The Futur
Apr 25
Why Process is Better Than AI w/ Scott Clum | Ep 430
20VC (20 Minute VC)
Apr 25
20Product: Replit CEO on Why Coding Models Are Plateauing | Why the SaaS Apocalypse is Justified: Will Incumbents Be Replaced? | Why IDEs Are Dead and Do PMs Survive the Next 3-5 Years with Amjad Masad
This Week in Startups
Apr 25
The Defense Tech Startup YC Kicked Out of a Meeting is Now Arming America | E2280
Marketplace
Apr 24
When does AI become a spending suck?
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into Latent Space.
Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime