What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado
Episode
47 min
Read time
2 min
Topics
Startups, Fundraising & VC, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Bayesian Wind Tunnel methodology: To prove LLMs perform true Bayesian inference rather than superficial pattern matching, Misra's team created controlled experiments using blank architectures trained on tasks mathematically impossible to memorize. Transformers matched the analytically calculated Bayesian posterior to 10⁻³ bit accuracy. Mamba performed nearly as well; LSTMs partially; MLPs failed entirely. Architecture, not training data, determines this capability.
- ✓The Frozen Weights Problem: LLMs perform Bayesian updating within a conversation but reset completely when a new session begins — weights are frozen post-training. Human brains maintain synaptic plasticity throughout life, continuously updating from experience. Continual learning research must solve catastrophic forgetting: updating weights on new information without erasing previously learned knowledge before plasticity becomes viable.
- ✓Shannon Entropy vs. Kolmogorov Complexity: LLMs operate in the Shannon entropy domain — learning correlations across all available data. Human reasoning operates closer to Kolmogorov complexity — finding the shortest causal program that explains observations. Einstein's field equation (Gμν = 8πTμν) is a minimal representation explaining Mercury's orbit, gravitational lensing, and GPS simultaneously. LLMs cannot generate equivalent new representations.
- ✓The Einstein AGI Test: A concrete benchmark for AGI: train an LLM exclusively on pre-1911 physics data and determine whether it independently derives the theory of relativity. Current models would fail because they are bound to existing data manifolds and cannot construct new causal representations that reconcile anomalous observations like Michelson-Morley experiment results with Newtonian mechanics.
- ✓Causation vs. Correlation as the Core Gap: Deep learning performs association — the first tier of Judea Pearl's causal hierarchy. It does not perform intervention or counterfactual reasoning, which require internal simulation models. When a person dodges a thrown object, the brain runs a causal simulation, not a probability calculation. Building architectures capable of causal modeling, not scaling existing ones, is the necessary research direction.
What It Covers
Columbia University professor Vishal Misra presents mathematical proof that transformers perform precise Bayesian inference, matching theoretically correct posteriors to 10⁻³ bit accuracy. He argues two unsolved problems — continual learning plasticity and moving from correlation to causation — separate current LLMs from genuine artificial general intelligence.
Key Questions Answered
- •Bayesian Wind Tunnel methodology: To prove LLMs perform true Bayesian inference rather than superficial pattern matching, Misra's team created controlled experiments using blank architectures trained on tasks mathematically impossible to memorize. Transformers matched the analytically calculated Bayesian posterior to 10⁻³ bit accuracy. Mamba performed nearly as well; LSTMs partially; MLPs failed entirely. Architecture, not training data, determines this capability.
- •The Frozen Weights Problem: LLMs perform Bayesian updating within a conversation but reset completely when a new session begins — weights are frozen post-training. Human brains maintain synaptic plasticity throughout life, continuously updating from experience. Continual learning research must solve catastrophic forgetting: updating weights on new information without erasing previously learned knowledge before plasticity becomes viable.
- •Shannon Entropy vs. Kolmogorov Complexity: LLMs operate in the Shannon entropy domain — learning correlations across all available data. Human reasoning operates closer to Kolmogorov complexity — finding the shortest causal program that explains observations. Einstein's field equation (Gμν = 8πTμν) is a minimal representation explaining Mercury's orbit, gravitational lensing, and GPS simultaneously. LLMs cannot generate equivalent new representations.
- •The Einstein AGI Test: A concrete benchmark for AGI: train an LLM exclusively on pre-1911 physics data and determine whether it independently derives the theory of relativity. Current models would fail because they are bound to existing data manifolds and cannot construct new causal representations that reconcile anomalous observations like Michelson-Morley experiment results with Newtonian mechanics.
- •Causation vs. Correlation as the Core Gap: Deep learning performs association — the first tier of Judea Pearl's causal hierarchy. It does not perform intervention or counterfactual reasoning, which require internal simulation models. When a person dodges a thrown object, the brain runs a causal simulation, not a probability calculation. Building architectures capable of causal modeling, not scaling existing ones, is the necessary research direction.
Notable Moment
Misra describes Donald Knuth's viral Hamiltonian cycle result as validation of LLM limits rather than evidence of emerging generality — the models exhausted their search space and stalled, while Knuth himself constructed the novel mathematical proof, demonstrating that humans still supply the causal reasoning layer.
You just read a 3-minute summary of a 44-minute episode.
Get a16z Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from a16z Podcast
Samo Burja on Growth, Energy, and AI
Jun 12 · 27 min
Huberman Lab
Eating for Better Sleep & Foods that Improve Metabolic Health | Dr. Marie-Pierre St-Onge
Jun 8
More from a16z Podcast
Designing the Physical World with AI
Jun 11 · 50 min
Up First (NPR)
How 5 minutes of movement can change your life
May 31
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Books

by Judea Pearl
“Deep learning performs association — the first tier of Judea Pearl's causal hierarchy. It does not perform intervention or counterfactual reasoning, which require internal simulation models.”
More from a16z Podcast
We summarize every new episode. Want them in your inbox?
Samo Burja on Growth, Energy, and AI
Designing the Physical World with AI
Tyler Cowen & Alex Tabarrok on AI, Jobs, and Economic Growth
Building Search for AI Agents with Exa CEO Will Bryk
AI Agents and the Fight for Customer Data
Similar Episodes
Related episodes from other podcasts
Huberman Lab
Jun 8
Eating for Better Sleep & Foods that Improve Metabolic Health | Dr. Marie-Pierre St-Onge
Up First (NPR)
May 31
How 5 minutes of movement can change your life
10% Happier with Dan Harris
May 6
The Easy, Simple Fix for Exhaustion, Foggy Brain, and Back Pain | Manoush Zomorodi
The Daily (NYT)
May 4
What Drives Political Violence in America
The Ezra Klein Show
Mar 24
How Bad Could the Iran Oil Crisis Get?
Explore Related Topics
This podcast is featured in Best Business Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Startups & Product Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into a16z Podcast.
Every Monday, we deliver AI summaries of the latest episodes from a16z Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime