Skip to main content
Latent Space

🔬Beyond AlphaFold: How Boltz is Open-Sourcing the Future of Drug Discovery

81 min episode · 2 min read
·

Episode

81 min

Read time

2 min

Topics

Science & Discovery

AI-Generated Summary

Key Takeaways

  • Training Under Constraints: Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch. They used Department of Energy cluster resources with two-day training windows followed by week-long queue waits, eventually completing training with help from Genesis compute resources.
  • Coevolution as Structural Hints: AlphaFold models decode evolutionary patterns where amino acid positions that mutate together across species indicate spatial proximity in 3D structure. This coevolutionary data acts like database lookup, providing strong priors that guide models to approximate solution spaces before physics-based refinement finds low-energy states. Models struggle without this evolutionary signal on novel proteins.
  • Generative Modeling Over Regression: AlphaFold3 shifted from predicting single structures to sampling posterior distributions of possible conformations. This generative approach handles uncertainty better than regression, which averages conflicting predictions into incorrect structures. The architecture uses diffusion models with cubic computational complexity from pairwise operations, requiring fewer parameters but more compute than language models.
  • Validation Through Distributed Testing: Boltz coordinated 25 academic and industry labs to test designs across diverse applications, reporting results from 8-10 labs in their paper. For nanobodies targeting 14 novel proteins with no known interactions in training data, they achieved nanomolar binders on two-thirds of targets using just 15 designs per target, demonstrating true generalization beyond training distribution.
  • Atomic-Level Sequence Prediction: BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements. This unified supervision signal scales better than separate discrete and continuous objectives.

What It Covers

Gabriela Corso and Jeremy Volven from Boltz explain how they open-sourced protein structure prediction after AlphaFold3 remained proprietary. They trained their model once with limited compute, fixing bugs mid-training, and built BoltzLab to democratize drug discovery through accessible AI tools for designing proteins and small molecules that bind therapeutic targets.

Key Questions Answered

  • Training Under Constraints: Boltz-1 was trained only once due to compute limitations, requiring live debugging during training runs. The team stopped training mid-run to fix bugs, then resumed without restarting from scratch. They used Department of Energy cluster resources with two-day training windows followed by week-long queue waits, eventually completing training with help from Genesis compute resources.
  • Coevolution as Structural Hints: AlphaFold models decode evolutionary patterns where amino acid positions that mutate together across species indicate spatial proximity in 3D structure. This coevolutionary data acts like database lookup, providing strong priors that guide models to approximate solution spaces before physics-based refinement finds low-energy states. Models struggle without this evolutionary signal on novel proteins.
  • Generative Modeling Over Regression: AlphaFold3 shifted from predicting single structures to sampling posterior distributions of possible conformations. This generative approach handles uncertainty better than regression, which averages conflicting predictions into incorrect structures. The architecture uses diffusion models with cubic computational complexity from pairwise operations, requiring fewer parameters but more compute than language models.
  • Validation Through Distributed Testing: Boltz coordinated 25 academic and industry labs to test designs across diverse applications, reporting results from 8-10 labs in their paper. For nanobodies targeting 14 novel proteins with no known interactions in training data, they achieved nanomolar binders on two-thirds of targets using just 15 designs per target, demonstrating true generalization beyond training distribution.
  • Atomic-Level Sequence Prediction: BoltzGen predicts both protein structure and sequence simultaneously by encoding amino acids through atomic composition. The model receives blank tokens for designed proteins and predicts atomic positions, which implicitly determine amino acid identity since different residues have unique atomic arrangements. This unified supervision signal scales better than separate discrete and continuous objectives.
  • Infrastructure Cost Advantage: Running Boltz models on their platform costs significantly less than self-hosting open-source versions. Their small molecule screening pipeline runs 10x faster than open-source implementations through optimization. Platform users can parallelize 100,000 candidate designs across GPU fleets, completing in minutes what would take weeks serially, amortizing compute costs across customers.

Notable Moment

The team revealed their flagship model went through an unrepeatable training curriculum because they could only afford one training run. While the model trained, they discovered and fixed bugs on the fly, stopping and restarting training multiple times without returning to the beginning. This improvised approach somehow produced a working model that matched AlphaFold3 performance despite the chaotic development process.

Know someone who'd find this useful?

You just read a 3-minute summary of a 78-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Latent Space

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime