He Co-Invented the Transformer. Now: Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]
Episode
72 min
Read time
3 min
Topics
Investing, Leadership, Design & UX
AI-Generated Summary
Key Takeaways
- ✓Architecture Lock-in: Transformers dominate not because alternatives are worse, but because switching costs are prohibitive. Competing architectures must be "crushingly better" — not marginally better — to displace an established system with mature tooling, fine-tuning pipelines, and inference infrastructure. The same pattern occurred when transformers displaced RNNs: the accuracy jump was so large that researchers had no choice but to migrate.
- ✓Continuous Thought Machine Design: The CTM introduces three architectural novelties: an internal sequential "thought" dimension that applies compute across discrete steps, neuron-level models (NLMs) that treat each neuron as a small MLP processing a history of activations rather than a single ReLU, and synchronization representations that measure dot-product correlations between neuron activation time series to encode richer, temporally-aware state.
- ✓Native Adaptive Computation: Training the CTM on ImageNet with a dual-loss — minimizing cross-entropy at both the lowest-loss step and the highest-certainty step — causes easy examples to resolve in one or two steps while hard examples use the full 50-step budget. This adaptive behavior emerges without explicit computation-penalty terms, unlike Alex Graves' Adaptive Computation Time paper, which required carefully tuned auxiliary losses.
- ✓Calibration as Architecture Signal: After standard training, the CTM produced near-perfect probability calibration on classification tasks — meaning a 90% confidence prediction was correct roughly 90% of the time. Most neural networks trained to convergence become poorly calibrated and require post-hoc correction. The CTM's emergent calibration suggests the synchronization-based representation aligns model uncertainty with actual error rates more naturally.
- ✓SudokuBench Reasoning Gap: Sakana AI released SudokuBench, a dataset of handcrafted variant Sudoku puzzles with unique natural-language rule sets, sourced from thousands of hours of Cracking the Cryptic YouTube videos providing detailed human reasoning traces. Current top models solve only the simplest puzzles at around 15% accuracy. GPT-4 shows improvement but cannot find the "break-in" insight each puzzle requires, exposing a fundamental gap in sequential deductive reasoning.
What It Covers
Llion Jones, co-inventor of the transformer, and Sakana AI researcher Luke Darlow discuss the Continuous Thought Machine (CTM), a spotlight paper at NeurIPS 2025. They examine why AI research is trapped in a transformer-centric local minimum, how biological neuron synchronization inspired a new recurrent architecture, and why research freedom produces better science than commercial pressure.
Key Questions Answered
- •Architecture Lock-in: Transformers dominate not because alternatives are worse, but because switching costs are prohibitive. Competing architectures must be "crushingly better" — not marginally better — to displace an established system with mature tooling, fine-tuning pipelines, and inference infrastructure. The same pattern occurred when transformers displaced RNNs: the accuracy jump was so large that researchers had no choice but to migrate.
- •Continuous Thought Machine Design: The CTM introduces three architectural novelties: an internal sequential "thought" dimension that applies compute across discrete steps, neuron-level models (NLMs) that treat each neuron as a small MLP processing a history of activations rather than a single ReLU, and synchronization representations that measure dot-product correlations between neuron activation time series to encode richer, temporally-aware state.
- •Native Adaptive Computation: Training the CTM on ImageNet with a dual-loss — minimizing cross-entropy at both the lowest-loss step and the highest-certainty step — causes easy examples to resolve in one or two steps while hard examples use the full 50-step budget. This adaptive behavior emerges without explicit computation-penalty terms, unlike Alex Graves' Adaptive Computation Time paper, which required carefully tuned auxiliary losses.
- •Calibration as Architecture Signal: After standard training, the CTM produced near-perfect probability calibration on classification tasks — meaning a 90% confidence prediction was correct roughly 90% of the time. Most neural networks trained to convergence become poorly calibrated and require post-hoc correction. The CTM's emergent calibration suggests the synchronization-based representation aligns model uncertainty with actual error rates more naturally.
- •SudokuBench Reasoning Gap: Sakana AI released SudokuBench, a dataset of handcrafted variant Sudoku puzzles with unique natural-language rule sets, sourced from thousands of hours of Cracking the Cryptic YouTube videos providing detailed human reasoning traces. Current top models solve only the simplest puzzles at around 15% accuracy. GPT-4 shows improvement but cannot find the "break-in" insight each puzzle requires, exposing a fundamental gap in sequential deductive reasoning.
- •Research Freedom as Competitive Strategy: Jones argues that protecting researcher autonomy is a primary leadership responsibility at Sakana AI. Commercial pressure — investor return expectations, product deadlines, publication quotas — systematically narrows the solution space researchers explore. The CTM itself emerged from eight months of unconstrained exploration with no predetermined goal, producing emergent behaviors like backtracking maze navigation and leapfrog path-solving under constrained compute budgets.
Notable Moment
During training, the CTM spontaneously developed two distinct maze-solving strategies depending on available compute steps. With sufficient time, it traced paths sequentially. When steps were constrained, it instead leapfrogged ahead, traced segments backward, then jumped forward again — an algorithm the researchers never designed or anticipated, emerging purely from architectural constraints.
You just read a 3-minute summary of a 69-minute episode.
Get Machine Learning Street Talk summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Machine Learning Street Talk
He won a Nobel here for AlphaFold. Then he left. - John Jumper
Jun 22 · 53 min
Equity
This Sequoia-backed lab thinks the brain is 'the floor, not the ceiling' for AI
Feb 10
More from Machine Learning Street Talk
When AI Decides You're a Threat — Brad Carson
May 31 · 80 min
The Jordan Harbinger Show
1348: Medical Tourism | Skeptical Sunday
Jun 21
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Books
by Alex Graves
“This adaptive behavior emerges without explicit computation-penalty terms, unlike Alex Graves' Adaptive Computation Time paper, which required carefully tuned auxiliary losses.”
Products
- SudokuBenchBy guest
by Sakana AI
“Sakana AI released SudokuBench, a dataset of handcrafted variant Sudoku puzzles with unique natural-language rule sets, sourced from thousands of hours of Cracking the Cryptic YouTube videos.”
other
- Continuous Thought Machine (CTM)By guest
by Sakana AI
“Llion Jones, co-inventor of the transformer, and Sakana AI researcher Luke Darlow discuss the Continuous Thought Machine (CTM), a spotlight paper at NeurIPS 2025.”
podcast
“SudokuBench, a dataset of handcrafted variant Sudoku puzzles with unique natural-language rule sets, sourced from thousands of hours of Cracking the Cryptic YouTube videos providing detailed human reasoning traces.”
More from Machine Learning Street Talk
We summarize every new episode. Want them in your inbox?
He won a Nobel here for AlphaFold. Then he left. - John Jumper
When AI Decides You're a Threat — Brad Carson
Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)
The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]
When AI Discovers The Next Transformer - Robert Lange (Sakana)
Similar Episodes
Related episodes from other podcasts
Equity
Feb 10
This Sequoia-backed lab thinks the brain is 'the floor, not the ceiling' for AI
The Jordan Harbinger Show
Jun 21
1348: Medical Tourism | Skeptical Sunday
The Jordan Harbinger Show
Jun 14
1344: Avocados | Skeptical Sunday
The Jordan Harbinger Show
May 31
1336: Dialysis | Skeptical Sunday
Business Breakdowns
Apr 24
Altius Minerals: Royalty Check - [Business Breakdowns, EP.243]
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Machine Learning Street Talk.
Every Monday, we deliver AI summaries of the latest episodes from Machine Learning Street Talk and 192+ other podcasts. Free for one show.
Start My Monday DigestNo credit card · Unsubscribe anytime