When AI Discovers The Next Transformer - Robert Lange (Sakana)
Episode
78 min
Read time
3 min
Topics
Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Sample Efficiency via Model Ensembling: Shinka Evolve reduces LLM query costs by running multiple frontier models (GPT, Gemini, Grok) simultaneously and using a Upper Confidence Bound bandit algorithm to adaptively route each program mutation to the best-performing model. This approach achieves competitive circle packing results in fewer than 200 evaluations, compared to the thousands typically required by similar systems like AlphaEvolve.
- ✓The "Problem-Problem" Bottleneck: Current evolutionary LLM systems treat the problem as fixed, but breakthroughs often require first inventing a surrogate or reformulated problem. Shinka Evolve demonstrated this when using a slightly relaxed circle-overlap constraint as a proxy problem accelerated convergence. Building systems that automatically generate and evolve problem formulations alongside solutions represents the next critical frontier for AI-driven discovery.
- ✓Stepping Stones Over Direct Optimization: Drawing from Kenneth Stanley's open-endedness research, Lange argues that starting from an impoverished or minimal initial solution generates more diversity and ultimately better results than starting from a highly optimized one. Systems that accumulate diverse intermediate solutions — even seemingly unproductive ones — build the combinatorial foundation needed for genuine breakthroughs, mirroring how biological evolution produces complexity through non-directed exploration.
- ✓Crossover and Full-Rewrite Mutations Add Diversity: Beyond diff-based patches used in AlphaEvolve, Shinka Evolve introduces two additional mutation operators: complete program rewrites and crossover between two parent programs. Sampling two parent programs and prompting the LLM to produce a complementary improvement proved especially useful on structured problems. A global "meta scratch pad" summarizes discoveries across the program tree and injects shared insights into subsequent system prompts.
- ✓AI Scientist v2 Implements Falsificationist Loop: Unlike v1's linear template-based execution, AI Scientist v2 runs a parallelizable agentic tree search where the LLM drafts its own experimental setup, executes code, receives numerical feedback, and iteratively refines hypotheses — mirroring Karl Popper's falsificationism. A workshop-level paper produced by the system passed the acceptance threshold at an ICLR workshop, marking the first fully autonomous compute-to-scientific-output pipeline.
What It Covers
Robert Lange from Sakana AI discusses Shinka Evolve, an open-source evolutionary framework that uses multiple LLMs in parallel to discover novel algorithms and scientific solutions. The system improves on AlphaEvolve's approach through model ensembling, UCB-based adaptive model selection, and crossover mutations, achieving state-of-the-art circle packing results in under 200 LLM evaluations.
Key Questions Answered
- •Sample Efficiency via Model Ensembling: Shinka Evolve reduces LLM query costs by running multiple frontier models (GPT, Gemini, Grok) simultaneously and using a Upper Confidence Bound bandit algorithm to adaptively route each program mutation to the best-performing model. This approach achieves competitive circle packing results in fewer than 200 evaluations, compared to the thousands typically required by similar systems like AlphaEvolve.
- •The "Problem-Problem" Bottleneck: Current evolutionary LLM systems treat the problem as fixed, but breakthroughs often require first inventing a surrogate or reformulated problem. Shinka Evolve demonstrated this when using a slightly relaxed circle-overlap constraint as a proxy problem accelerated convergence. Building systems that automatically generate and evolve problem formulations alongside solutions represents the next critical frontier for AI-driven discovery.
- •Stepping Stones Over Direct Optimization: Drawing from Kenneth Stanley's open-endedness research, Lange argues that starting from an impoverished or minimal initial solution generates more diversity and ultimately better results than starting from a highly optimized one. Systems that accumulate diverse intermediate solutions — even seemingly unproductive ones — build the combinatorial foundation needed for genuine breakthroughs, mirroring how biological evolution produces complexity through non-directed exploration.
- •Crossover and Full-Rewrite Mutations Add Diversity: Beyond diff-based patches used in AlphaEvolve, Shinka Evolve introduces two additional mutation operators: complete program rewrites and crossover between two parent programs. Sampling two parent programs and prompting the LLM to produce a complementary improvement proved especially useful on structured problems. A global "meta scratch pad" summarizes discoveries across the program tree and injects shared insights into subsequent system prompts.
- •AI Scientist v2 Implements Falsificationist Loop: Unlike v1's linear template-based execution, AI Scientist v2 runs a parallelizable agentic tree search where the LLM drafts its own experimental setup, executes code, receives numerical feedback, and iteratively refines hypotheses — mirroring Karl Popper's falsificationism. A workshop-level paper produced by the system passed the acceptance threshold at an ICLR workshop, marking the first fully autonomous compute-to-scientific-output pipeline.
- •Verification Remains the Hard Constraint: Generating candidate solutions is computationally easier than rigorously verifying them. LLMs can perform soft verification by latently tracing code execution, but this remains inexact and susceptible to reward hacking. Lange identifies automatic verifier design — systems that both formulate problems and construct their own correctness checkers — as the most critical unsolved challenge before AI-driven science can operate reliably without human oversight.
Notable Moment
Lange describes running Shinka Evolve with a slightly relaxed circle-overlap constraint as a proxy problem, which accelerated convergence. When the system was rerun with exact constraints, it took noticeably longer to reach the same solution quality — demonstrating that surrogate problem design, typically a human insight, could itself become an automated discovery target.
You just read a 3-minute summary of a 75-minute episode.
Get Machine Learning Street Talk summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Machine Learning Street Talk
"Vibe Coding is a Slot Machine" - Jeremy Howard
Mar 3 · 86 min
Morning Brew Daily
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
Apr 30
More from Machine Learning Street Talk
Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas
Feb 16 · 55 min
a16z Podcast
Workday’s Last Workday? AI and the Future of Enterprise Software
Apr 30
More from Machine Learning Street Talk
We summarize every new episode. Want them in your inbox?
"Vibe Coding is a Slot Machine" - Jeremy Howard
Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas
VAEs Are Energy-Based Models? [Dr. Jeff Beck]
Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]
Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION]
Similar Episodes
Related episodes from other podcasts
Morning Brew Daily
Apr 30
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
a16z Podcast
Apr 30
Workday’s Last Workday? AI and the Future of Enterprise Software
Masters of Scale
Apr 30
How Poppi’s founders built a new soda brand worth $2 billion
Snacks Daily
Apr 30
🦸♀️ “MAMA Stocks” — Zuck’s Ad/AI machine. Hilary Duff’s anti-Ozempic bet. Bill Ackman’s Influencer IPO. +Refresher surge
The Mel Robbins Podcast
Apr 30
Eat This to Live Longer, Stay Young, and Transform Your Health
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Machine Learning Street Talk.
Every Monday, we deliver AI summaries of the latest episodes from Machine Learning Street Talk and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime