Skip to main content
Machine Learning Street Talk

When AI Discovers The Next Transformer - Robert Lange (Sakana)

78 min episode · 3 min read
·

Episode

78 min

Read time

3 min

Topics

Artificial Intelligence

AI-Generated Summary

Key Takeaways

  • Sample Efficiency via Model Ensembling: Shinka Evolve reduces LLM query costs by running multiple frontier models (GPT, Gemini, Grok) simultaneously and using a Upper Confidence Bound bandit algorithm to adaptively route each program mutation to the best-performing model. This approach achieves competitive circle packing results in fewer than 200 evaluations, compared to the thousands typically required by similar systems like AlphaEvolve.
  • The "Problem-Problem" Bottleneck: Current evolutionary LLM systems treat the problem as fixed, but breakthroughs often require first inventing a surrogate or reformulated problem. Shinka Evolve demonstrated this when using a slightly relaxed circle-overlap constraint as a proxy problem accelerated convergence. Building systems that automatically generate and evolve problem formulations alongside solutions represents the next critical frontier for AI-driven discovery.
  • Stepping Stones Over Direct Optimization: Drawing from Kenneth Stanley's open-endedness research, Lange argues that starting from an impoverished or minimal initial solution generates more diversity and ultimately better results than starting from a highly optimized one. Systems that accumulate diverse intermediate solutions — even seemingly unproductive ones — build the combinatorial foundation needed for genuine breakthroughs, mirroring how biological evolution produces complexity through non-directed exploration.
  • Crossover and Full-Rewrite Mutations Add Diversity: Beyond diff-based patches used in AlphaEvolve, Shinka Evolve introduces two additional mutation operators: complete program rewrites and crossover between two parent programs. Sampling two parent programs and prompting the LLM to produce a complementary improvement proved especially useful on structured problems. A global "meta scratch pad" summarizes discoveries across the program tree and injects shared insights into subsequent system prompts.
  • AI Scientist v2 Implements Falsificationist Loop: Unlike v1's linear template-based execution, AI Scientist v2 runs a parallelizable agentic tree search where the LLM drafts its own experimental setup, executes code, receives numerical feedback, and iteratively refines hypotheses — mirroring Karl Popper's falsificationism. A workshop-level paper produced by the system passed the acceptance threshold at an ICLR workshop, marking the first fully autonomous compute-to-scientific-output pipeline.

What It Covers

Robert Lange from Sakana AI discusses Shinka Evolve, an open-source evolutionary framework that uses multiple LLMs in parallel to discover novel algorithms and scientific solutions. The system improves on AlphaEvolve's approach through model ensembling, UCB-based adaptive model selection, and crossover mutations, achieving state-of-the-art circle packing results in under 200 LLM evaluations.

Key Questions Answered

  • Sample Efficiency via Model Ensembling: Shinka Evolve reduces LLM query costs by running multiple frontier models (GPT, Gemini, Grok) simultaneously and using a Upper Confidence Bound bandit algorithm to adaptively route each program mutation to the best-performing model. This approach achieves competitive circle packing results in fewer than 200 evaluations, compared to the thousands typically required by similar systems like AlphaEvolve.
  • The "Problem-Problem" Bottleneck: Current evolutionary LLM systems treat the problem as fixed, but breakthroughs often require first inventing a surrogate or reformulated problem. Shinka Evolve demonstrated this when using a slightly relaxed circle-overlap constraint as a proxy problem accelerated convergence. Building systems that automatically generate and evolve problem formulations alongside solutions represents the next critical frontier for AI-driven discovery.
  • Stepping Stones Over Direct Optimization: Drawing from Kenneth Stanley's open-endedness research, Lange argues that starting from an impoverished or minimal initial solution generates more diversity and ultimately better results than starting from a highly optimized one. Systems that accumulate diverse intermediate solutions — even seemingly unproductive ones — build the combinatorial foundation needed for genuine breakthroughs, mirroring how biological evolution produces complexity through non-directed exploration.
  • Crossover and Full-Rewrite Mutations Add Diversity: Beyond diff-based patches used in AlphaEvolve, Shinka Evolve introduces two additional mutation operators: complete program rewrites and crossover between two parent programs. Sampling two parent programs and prompting the LLM to produce a complementary improvement proved especially useful on structured problems. A global "meta scratch pad" summarizes discoveries across the program tree and injects shared insights into subsequent system prompts.
  • AI Scientist v2 Implements Falsificationist Loop: Unlike v1's linear template-based execution, AI Scientist v2 runs a parallelizable agentic tree search where the LLM drafts its own experimental setup, executes code, receives numerical feedback, and iteratively refines hypotheses — mirroring Karl Popper's falsificationism. A workshop-level paper produced by the system passed the acceptance threshold at an ICLR workshop, marking the first fully autonomous compute-to-scientific-output pipeline.
  • Verification Remains the Hard Constraint: Generating candidate solutions is computationally easier than rigorously verifying them. LLMs can perform soft verification by latently tracing code execution, but this remains inexact and susceptible to reward hacking. Lange identifies automatic verifier design — systems that both formulate problems and construct their own correctness checkers — as the most critical unsolved challenge before AI-driven science can operate reliably without human oversight.

Notable Moment

Lange describes running Shinka Evolve with a slightly relaxed circle-overlap constraint as a proxy problem, which accelerated convergence. When the system was rerun with exact constraints, it took noticeably longer to reach the same solution quality — demonstrating that surrogate problem design, typically a human insight, could itself become an automated discovery target.

Know someone who'd find this useful?

You just read a 3-minute summary of a 75-minute episode.

Get Machine Learning Street Talk summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Machine Learning Street Talk

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Machine Learning Street Talk.

Every Monday, we deliver AI summaries of the latest episodes from Machine Learning Street Talk and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime