What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado
Episode
47 min
Read time
2 min
AI-Generated Summary
Key Takeaways
- ✓Bayesian Wind Tunnel methodology: To prove LLMs perform true Bayesian inference rather than superficial pattern matching, Misra's team created controlled experiments using blank architectures trained on tasks mathematically impossible to memorize. Transformers matched the analytically calculated Bayesian posterior to 10⁻³ bit accuracy. Mamba performed nearly as well; LSTMs partially; MLPs failed entirely. Architecture, not training data, determines this capability.
- ✓The Frozen Weights Problem: LLMs perform Bayesian updating within a conversation but reset completely when a new session begins — weights are frozen post-training. Human brains maintain synaptic plasticity throughout life, continuously updating from experience. Continual learning research must solve catastrophic forgetting: updating weights on new information without erasing previously learned knowledge before plasticity becomes viable.
- ✓Shannon Entropy vs. Kolmogorov Complexity: LLMs operate in the Shannon entropy domain — learning correlations across all available data. Human reasoning operates closer to Kolmogorov complexity — finding the shortest causal program that explains observations. Einstein's field equation (Gμν = 8πTμν) is a minimal representation explaining Mercury's orbit, gravitational lensing, and GPS simultaneously. LLMs cannot generate equivalent new representations.
- ✓The Einstein AGI Test: A concrete benchmark for AGI: train an LLM exclusively on pre-1911 physics data and determine whether it independently derives the theory of relativity. Current models would fail because they are bound to existing data manifolds and cannot construct new causal representations that reconcile anomalous observations like Michelson-Morley experiment results with Newtonian mechanics.
- ✓Causation vs. Correlation as the Core Gap: Deep learning performs association — the first tier of Judea Pearl's causal hierarchy. It does not perform intervention or counterfactual reasoning, which require internal simulation models. When a person dodges a thrown object, the brain runs a causal simulation, not a probability calculation. Building architectures capable of causal modeling, not scaling existing ones, is the necessary research direction.
What It Covers
Columbia University professor Vishal Misra presents mathematical proof that transformers perform precise Bayesian inference, matching theoretically correct posteriors to 10⁻³ bit accuracy. He argues two unsolved problems — continual learning plasticity and moving from correlation to causation — separate current LLMs from genuine artificial general intelligence.
Key Questions Answered
- •Bayesian Wind Tunnel methodology: To prove LLMs perform true Bayesian inference rather than superficial pattern matching, Misra's team created controlled experiments using blank architectures trained on tasks mathematically impossible to memorize. Transformers matched the analytically calculated Bayesian posterior to 10⁻³ bit accuracy. Mamba performed nearly as well; LSTMs partially; MLPs failed entirely. Architecture, not training data, determines this capability.
- •The Frozen Weights Problem: LLMs perform Bayesian updating within a conversation but reset completely when a new session begins — weights are frozen post-training. Human brains maintain synaptic plasticity throughout life, continuously updating from experience. Continual learning research must solve catastrophic forgetting: updating weights on new information without erasing previously learned knowledge before plasticity becomes viable.
- •Shannon Entropy vs. Kolmogorov Complexity: LLMs operate in the Shannon entropy domain — learning correlations across all available data. Human reasoning operates closer to Kolmogorov complexity — finding the shortest causal program that explains observations. Einstein's field equation (Gμν = 8πTμν) is a minimal representation explaining Mercury's orbit, gravitational lensing, and GPS simultaneously. LLMs cannot generate equivalent new representations.
- •The Einstein AGI Test: A concrete benchmark for AGI: train an LLM exclusively on pre-1911 physics data and determine whether it independently derives the theory of relativity. Current models would fail because they are bound to existing data manifolds and cannot construct new causal representations that reconcile anomalous observations like Michelson-Morley experiment results with Newtonian mechanics.
- •Causation vs. Correlation as the Core Gap: Deep learning performs association — the first tier of Judea Pearl's causal hierarchy. It does not perform intervention or counterfactual reasoning, which require internal simulation models. When a person dodges a thrown object, the brain runs a causal simulation, not a probability calculation. Building architectures capable of causal modeling, not scaling existing ones, is the necessary research direction.
Notable Moment
Misra describes Donald Knuth's viral Hamiltonian cycle result as validation of LLM limits rather than evidence of emerging generality — the models exhausted their search space and stalled, while Knuth himself constructed the novel mathematical proof, demonstrating that humans still supply the causal reasoning layer.
You just read a 3-minute summary of a 44-minute episode.
Get a16z Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from a16z Podcast
We summarize every new episode. Want them in your inbox?
Balaji and Taylor Lorenz on AI and Media
Workday’s Last Workday? AI and the Future of Enterprise Software
The Shift in Global Drug Development
John and Patrick Collison on Stripe's Growth, Agent Commerce, and the Future of Software
Ben Horowitz on Venture Capital and AI
Similar Episodes
Related episodes from other podcasts
Marketplace
May 1
Consumer electronics can't keep up with AI
The AI Breakdown
May 1
The Week AI Grew Up
BiggerPockets Real Estate Podcast
May 1
How to Fail at Real Estate Investing in 2026
Hard Fork
May 1
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM
Bankless
May 1
ROLLUP: $120 Oil vs New Highs | AI Boom Masks War | IPO Top Signal | DeFi Bailout
This podcast is featured in Best Business Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into a16z Podcast.
Every Monday, we deliver AI summaries of the latest episodes from a16z Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime