Skip to main content
LJ

Llion Jones

Llion Jones**architecture Lock-in**continuous Thought Machine Design**native Adaptive Computation**calibration as Architecture Signal
1episode
1podcast

We have 1 summarized appearance for Llion Jones so far. Browse all podcasts to discover more episodes.

Featured On 1 Podcast

All Appearances

1 episode

AI Summary

→ WHAT IT COVERS Llion Jones, co-inventor of the transformer, and Sakana AI researcher Luke Darlow discuss the Continuous Thought Machine (CTM), a spotlight paper at NeurIPS 2025. They examine why AI research is trapped in a transformer-centric local minimum, how biological neuron synchronization inspired a new recurrent architecture, and why research freedom produces better science than commercial pressure. → KEY INSIGHTS - **Architecture Lock-in:** Transformers dominate not because alternatives are worse, but because switching costs are prohibitive. Competing architectures must be "crushingly better" — not marginally better — to displace an established system with mature tooling, fine-tuning pipelines, and inference infrastructure. The same pattern occurred when transformers displaced RNNs: the accuracy jump was so large that researchers had no choice but to migrate. - **Continuous Thought Machine Design:** The CTM introduces three architectural novelties: an internal sequential "thought" dimension that applies compute across discrete steps, neuron-level models (NLMs) that treat each neuron as a small MLP processing a history of activations rather than a single ReLU, and synchronization representations that measure dot-product correlations between neuron activation time series to encode richer, temporally-aware state. - **Native Adaptive Computation:** Training the CTM on ImageNet with a dual-loss — minimizing cross-entropy at both the lowest-loss step and the highest-certainty step — causes easy examples to resolve in one or two steps while hard examples use the full 50-step budget. This adaptive behavior emerges without explicit computation-penalty terms, unlike Alex Graves' Adaptive Computation Time paper, which required carefully tuned auxiliary losses. - **Calibration as Architecture Signal:** After standard training, the CTM produced near-perfect probability calibration on classification tasks — meaning a 90% confidence prediction was correct roughly 90% of the time. Most neural networks trained to convergence become poorly calibrated and require post-hoc correction. The CTM's emergent calibration suggests the synchronization-based representation aligns model uncertainty with actual error rates more naturally. - **SudokuBench Reasoning Gap:** Sakana AI released SudokuBench, a dataset of handcrafted variant Sudoku puzzles with unique natural-language rule sets, sourced from thousands of hours of Cracking the Cryptic YouTube videos providing detailed human reasoning traces. Current top models solve only the simplest puzzles at around 15% accuracy. GPT-4 shows improvement but cannot find the "break-in" insight each puzzle requires, exposing a fundamental gap in sequential deductive reasoning. - **Research Freedom as Competitive Strategy:** Jones argues that protecting researcher autonomy is a primary leadership responsibility at Sakana AI. Commercial pressure — investor return expectations, product deadlines, publication quotas — systematically narrows the solution space researchers explore. The CTM itself emerged from eight months of unconstrained exploration with no predetermined goal, producing emergent behaviors like backtracking maze navigation and leapfrog path-solving under constrained compute budgets. → NOTABLE MOMENT During training, the CTM spontaneously developed two distinct maze-solving strategies depending on available compute steps. With sufficient time, it traced paths sequentially. When steps were constrained, it instead leapfrogged ahead, traced segments backward, then jumped forward again — an algorithm the researchers never designed or anticipated, emerging purely from architectural constraints. 💼 SPONSORS [{"name": "Cyber Fund", "url": "https://cyber.fund"}, {"name": "Two For AI Labs", "url": "https://2for.ai"}] 🏷️ Neural Architecture Design, Adaptive Computation, Biological Inspiration, Reasoning Benchmarks, AI Research Culture, Transformer Alternatives

Explore More

Never miss Llion Jones's insights

Subscribe to get AI-powered summaries of Llion Jones's podcast appearances delivered to your inbox weekly.

Start Free Today

No credit card required • Free tier available