What are the key takeaways from this Latent Space episode?

Key insights include: **Sub-angstrom accuracy threshold:** Protein-ligand pose prediction must reach below one angstrom RMSD to be useful for downstream drug discovery tasks. The standard two-angstrom benchmark inherited from pre-AI docking studies allows entire aromatic rings to flip undetected, producing structurally plausible but chemically wrong outputs. Hydrogen bonds require donor-to-acceptor distances of 2.7–3.3 angstroms, a 0.6-angstrom window, meaning two-angstrom model errors routinely destroy the physical validity of predicted interactions.; **Diffusion as the correct primitive for molecular structure:** GANs, the dominant generative framework circa 2017–2018, failed for protein-ligand systems due to mode collapse and training instability. Diffusion models solved these problems and now represent the frontier of generative AI research in 3D structural biology. Counterintuitively, the most novel diffusion research is no longer in image or video generation but in co-folding tasks, where iterative denoising maps naturally onto conformational refinement of molecular complexes.; **Inference-time scaling via physics-guided diffusion:** Pearl applies inference-time compute scaling analogous to chain-of-thought reasoning in LLMs, but instead of language tokens, the model iterates over intermediate crystal structure representations. Physics-based force fields steer the diffusion trajectory during each refinement step, functioning like a verifier in a loop. This approach substantially improves pose accuracy without retraining and mirrors the three-stage LLM scaling roadmap: pretraining, post-training, then inference-time scaling.

What did Evan Feinberg and Sergey Edunov discuss on Latent Space?

Evan Feinberg (CEO) and Sergey Edunov (CTO, former Llama 2/3 pretraining lead) explain how Genesis Molecular AI built Pearl, a diffusion-based protein-ligand co-folding model achieving sub-one-angstrom pose prediction accuracy. They cover synthetic data generation, inference-time scaling, reinforcement learning loops with wet lab partners, and why the most innovative diffusion research now happens in 3D structural biology rather than image or language domains. Key topics include: **Sub-angstrom accuracy threshold:** Protein-ligand pose prediction must reach below one angstrom RMSD to be useful for downstream drug discovery tasks. The standard two-angstrom benchmark inherited from pre-AI docking studies allows entire aromatic rings to flip undetected, producing structurally plausible but chemically wrong outputs. Hydrogen bonds require donor-to-acceptor distances of 2.7–3.3 angstroms, a 0.6-angstrom window, meaning two-angstrom model errors routinely destroy the physical validity of predicted interactions.; **Diffusion as the correct primitive for molecular structure:** GANs, the dominant generative framework circa 2017–2018, failed for protein-ligand systems due to mode collapse and training instability. Diffusion models solved these problems and now represent the frontier of generative AI research in 3D structural biology. Counterintuitively, the most novel diffusion research is no longer in image or video generation but in co-folding tasks, where iterative denoising maps naturally onto conformational refinement of molecular complexes..

How long is this episode of Latent Space?

This episode is 108 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Latent Space

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

July 1, 2026

108 min episode · 3 min read

Evan Feinberg,Sergey Edunov

Episode

108 min

Read time

3 min

Topics

Productivity, Relationships, Startups

AI-Generated Summary

Published Jul 1, 2026

Key Takeaways

✓Sub-angstrom accuracy threshold: Protein-ligand pose prediction must reach below one angstrom RMSD to be useful for downstream drug discovery tasks. The standard two-angstrom benchmark inherited from pre-AI docking studies allows entire aromatic rings to flip undetected, producing structurally plausible but chemically wrong outputs. Hydrogen bonds require donor-to-acceptor distances of 2.7–3.3 angstroms, a 0.6-angstrom window, meaning two-angstrom model errors routinely destroy the physical validity of predicted interactions.
✓Diffusion as the correct primitive for molecular structure: GANs, the dominant generative framework circa 2017–2018, failed for protein-ligand systems due to mode collapse and training instability. Diffusion models solved these problems and now represent the frontier of generative AI research in 3D structural biology. Counterintuitively, the most novel diffusion research is no longer in image or video generation but in co-folding tasks, where iterative denoising maps naturally onto conformational refinement of molecular complexes.
✓Inference-time scaling via physics-guided diffusion: Pearl applies inference-time compute scaling analogous to chain-of-thought reasoning in LLMs, but instead of language tokens, the model iterates over intermediate crystal structure representations. Physics-based force fields steer the diffusion trajectory during each refinement step, functioning like a verifier in a loop. This approach substantially improves pose accuracy without retraining and mirrors the three-stage LLM scaling roadmap: pretraining, post-training, then inference-time scaling.
✓Synthetic data from molecular physics simulations: The RCSB Protein Data Bank contains roughly 200,000 crystal structures, an insufficient and slowly expanding dataset for training generalizable models. Small molecules, unlike large proteins, are computationally tractable for physics-based simulation, enabling Genesis to generate large volumes of synthetic conformational data for pretraining. This physics-derived synthetic data pipeline is the primary mechanism for scaling beyond public experimental data, directly paralleling how LLM pretraining uses synthetic text corpora.
✓Reinforcement learning with wet lab feedback loops: Genesis runs design-make-test-analyze cycles with Insilico Medicine as a lab partner, synthesizing model-proposed compounds, measuring binding and ADMET properties experimentally, and feeding results back into model fine-tuning. This closes the RL loop at the wet lab level rather than relying solely on GPU-based self-play. The bottleneck is synthesis complexity: novel chemistries required for Pareto-optimal multi-parameter molecules cannot yet be reliably automated, limiting cycle speed for frontier compounds.

What It Covers

Evan Feinberg (CEO) and Sergey Edunov (CTO, former Llama 2/3 pretraining lead) explain how Genesis Molecular AI built Pearl, a diffusion-based protein-ligand co-folding model achieving sub-one-angstrom pose prediction accuracy. They cover synthetic data generation, inference-time scaling, reinforcement learning loops with wet lab partners, and why the most innovative diffusion research now happens in 3D structural biology rather than image or language domains.

Key Questions Answered

•Sub-angstrom accuracy threshold: Protein-ligand pose prediction must reach below one angstrom RMSD to be useful for downstream drug discovery tasks. The standard two-angstrom benchmark inherited from pre-AI docking studies allows entire aromatic rings to flip undetected, producing structurally plausible but chemically wrong outputs. Hydrogen bonds require donor-to-acceptor distances of 2.7–3.3 angstroms, a 0.6-angstrom window, meaning two-angstrom model errors routinely destroy the physical validity of predicted interactions.
•Diffusion as the correct primitive for molecular structure: GANs, the dominant generative framework circa 2017–2018, failed for protein-ligand systems due to mode collapse and training instability. Diffusion models solved these problems and now represent the frontier of generative AI research in 3D structural biology. Counterintuitively, the most novel diffusion research is no longer in image or video generation but in co-folding tasks, where iterative denoising maps naturally onto conformational refinement of molecular complexes.
•Inference-time scaling via physics-guided diffusion: Pearl applies inference-time compute scaling analogous to chain-of-thought reasoning in LLMs, but instead of language tokens, the model iterates over intermediate crystal structure representations. Physics-based force fields steer the diffusion trajectory during each refinement step, functioning like a verifier in a loop. This approach substantially improves pose accuracy without retraining and mirrors the three-stage LLM scaling roadmap: pretraining, post-training, then inference-time scaling.
•Synthetic data from molecular physics simulations: The RCSB Protein Data Bank contains roughly 200,000 crystal structures, an insufficient and slowly expanding dataset for training generalizable models. Small molecules, unlike large proteins, are computationally tractable for physics-based simulation, enabling Genesis to generate large volumes of synthetic conformational data for pretraining. This physics-derived synthetic data pipeline is the primary mechanism for scaling beyond public experimental data, directly paralleling how LLM pretraining uses synthetic text corpora.
•Reinforcement learning with wet lab feedback loops: Genesis runs design-make-test-analyze cycles with Insilico Medicine as a lab partner, synthesizing model-proposed compounds, measuring binding and ADMET properties experimentally, and feeding results back into model fine-tuning. This closes the RL loop at the wet lab level rather than relying solely on GPU-based self-play. The bottleneck is synthesis complexity: novel chemistries required for Pareto-optimal multi-parameter molecules cannot yet be reliably automated, limiting cycle speed for frontier compounds.
•Multi-property optimization as the real drug discovery bottleneck: Binding affinity, solubility, oral bioavailability, CYP3A4 inhibition, hERG channel activity, and roughly 30 additional ADMET endpoints must all fall within acceptable ranges simultaneously. These properties frequently anti-correlate: hydrophobicity improves binding but reduces solubility; adding polarity to fix solubility impairs membrane permeability. Pearl's 3D structural outputs feed directly into ADMET prediction models, enabling multi-parameter optimization from a shared structural representation rather than treating each property as an independent flat-feature regression.
•Agentic drug discovery requires model quality thresholds first: Genesis is building an agentic platform codenamed Sapphire that orchestrates Pearl and ADMET models for continuous 24/7 drug design. The prerequisite was achieving pose prediction quality where model outputs would not be rejected by medicinal chemists. Below the one-angstrom threshold, agents amplify errors rather than productivity, analogous to mid-2024 coding agents that compounded subtle LLM bugs. The intended human role shifts to strategic direction-setting while agents handle tool orchestration, parameter selection, and iterative hypothesis generation.

Notable Moment

On the OpenBind benchmark featuring an EAAT2 protease target with a flexible loop that shifts upon ligand binding, Pearl correctly predicted the loop movement for every single pose tested. Other published open-source co-folding models failed on this conformational flexibility. The result was notable because Genesis had never seen this target during training, making it a genuine out-of-distribution generalization test.

Know someone who'd find this useful?

You just read a 3-minute summary of a 105-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

Jun 24 · 68 min

Software Engineering Daily

Mina the Hollower

Jun 25

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

Jun 22 · 66 min

Modern Wisdom

The Art of Unstoppable Self-Belief - Joe Santagato - #1108

Jun 8

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

RCSB Protein Data Bank
by RCSB
“The RCSB Protein Data Bank contains roughly 200,000 crystal structures, an insufficient and slowly expanding dataset for training generalizable models.”
OpenBind
“On the OpenBind benchmark featuring an EAAT2 protease target with a flexible loop that shifts upon ligand binding, Pearl correctly predicted the loop movement for every single pose tested.”

Products

SapphireBy guest
by Genesis Molecular AI
“Genesis is building an agentic platform codenamed Sapphire that orchestrates Pearl and ADMET models for continuous 24/7 drug design.”
Amazon
PearlBy guest
by Genesis Molecular AI
“Genesis Molecular AI built Pearl, a diffusion-based protein-ligand co-folding model achieving sub-one-angstrom pose prediction accuracy.”
Amazon

company

Insilico Medicine
“Genesis runs design-make-test-analyze cycles with Insilico Medicine as a lab partner, synthesizing model-proposed compounds, measuring binding and ADMET properties experimentally.”

Similar Episodes

Related episodes from other podcasts

Software Engineering Daily

Jun 25

Explore Related Topics

⚡Productivity 💕Relationships 🚀Startups

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Startups & Product Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

Mina the Hollower

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

The Art of Unstoppable Self-Belief - Joe Santagato - #1108

Books, tools, and gear mentioned in this episode

Tools

Products

company

More from Latent Space

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

The Professor of Outputmaxxing — Anjney Midha, AMP

🔬 The Self-Driving Lab — Joseph Krause, Radical AI

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Similar Episodes

Mina the Hollower

The Art of Unstoppable Self-Belief - Joe Santagato - #1108

A rational conversation on where AI is actually going | Benedict Evans

How Coach scaled from a single store into a global icon

Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

Explore Related Topics

You're clearly into Latent Space.