When AI Discovers The Next Transformer - Robert Lange (Sakana)
Machine Learning Street TalkAI Summary
→ WHAT IT COVERS Robert Lange from Sakana AI discusses Shinka Evolve, an open-source evolutionary framework that uses multiple LLMs in parallel to discover novel algorithms and scientific solutions. The system improves on AlphaEvolve's approach through model ensembling, UCB-based adaptive model selection, and crossover mutations, achieving state-of-the-art circle packing results in under 200 LLM evaluations. → KEY INSIGHTS - **Sample Efficiency via Model Ensembling:** Shinka Evolve reduces LLM query costs by running multiple frontier models (GPT, Gemini, Grok) simultaneously and using a Upper Confidence Bound bandit algorithm to adaptively route each program mutation to the best-performing model. This approach achieves competitive circle packing results in fewer than 200 evaluations, compared to the thousands typically required by similar systems like AlphaEvolve. - **The "Problem-Problem" Bottleneck:** Current evolutionary LLM systems treat the problem as fixed, but breakthroughs often require first inventing a surrogate or reformulated problem. Shinka Evolve demonstrated this when using a slightly relaxed circle-overlap constraint as a proxy problem accelerated convergence. Building systems that automatically generate and evolve problem formulations alongside solutions represents the next critical frontier for AI-driven discovery. - **Stepping Stones Over Direct Optimization:** Drawing from Kenneth Stanley's open-endedness research, Lange argues that starting from an impoverished or minimal initial solution generates more diversity and ultimately better results than starting from a highly optimized one. Systems that accumulate diverse intermediate solutions — even seemingly unproductive ones — build the combinatorial foundation needed for genuine breakthroughs, mirroring how biological evolution produces complexity through non-directed exploration. - **Crossover and Full-Rewrite Mutations Add Diversity:** Beyond diff-based patches used in AlphaEvolve, Shinka Evolve introduces two additional mutation operators: complete program rewrites and crossover between two parent programs. Sampling two parent programs and prompting the LLM to produce a complementary improvement proved especially useful on structured problems. A global "meta scratch pad" summarizes discoveries across the program tree and injects shared insights into subsequent system prompts. - **AI Scientist v2 Implements Falsificationist Loop:** Unlike v1's linear template-based execution, AI Scientist v2 runs a parallelizable agentic tree search where the LLM drafts its own experimental setup, executes code, receives numerical feedback, and iteratively refines hypotheses — mirroring Karl Popper's falsificationism. A workshop-level paper produced by the system passed the acceptance threshold at an ICLR workshop, marking the first fully autonomous compute-to-scientific-output pipeline. - **Verification Remains the Hard Constraint:** Generating candidate solutions is computationally easier than rigorously verifying them. LLMs can perform soft verification by latently tracing code execution, but this remains inexact and susceptible to reward hacking. Lange identifies automatic verifier design — systems that both formulate problems and construct their own correctness checkers — as the most critical unsolved challenge before AI-driven science can operate reliably without human oversight. → NOTABLE MOMENT Lange describes running Shinka Evolve with a slightly relaxed circle-overlap constraint as a proxy problem, which accelerated convergence. When the system was rerun with exact constraints, it took noticeably longer to reach the same solution quality — demonstrating that surrogate problem design, typically a human insight, could itself become an automated discovery target. 💼 SPONSORS None detected 🏷️ Evolutionary Algorithms, AI Scientific Discovery, Large Language Models, Open-Endedness, Algorithm Search, AI Automation