The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More
Cognitive RevolutionAI Summary
→ WHAT IT COVERS Google DeepMind's Logan Kilpatrick and Tulsee Doshi join host Neil Savage at Google HQ ahead of Google IO to discuss Gemini 3.5 Flash, the Omni video generation model, the Agent harness infrastructure, recursive self-improvement, context window limits, and Google's overall AI product strategy across its billions-of-users product surface. → KEY INSIGHTS - **Model-Harness Co-training:** Google now trains Gemini models in direct partnership with its "anti-gravity" agent harness, meaning the model is optimized for tool-calling loops, orchestration, and agentic workflows from the start. Teams building on top inherit this infrastructure without rebuilding it, cutting iteration cycles from months to days and enabling faster product launches across all Google surfaces. - **Flash-First Strategy:** Google deliberately prioritizes cost-adjusted performance over raw capability maximization. Gemini 3.5 Flash benchmarks at roughly 280 tokens per second on Artificial Analysis, runs three times faster than comparable large models, and costs significantly less. For consumer-scale products like Search and the Gemini app, latency improvements outperform quality gains in live experiments when users are unwilling to wait. - **"Model Eats the Scaffolding" Cycle:** Every 12–18 months, the surrounding scaffolding that developers build around AI models gets absorbed into the model itself. Product teams should avoid having every team independently rebuild agentic infrastructure from scratch. Standardizing on a shared harness layer reduces redundant engineering, accelerates deployment, and surfaces model failure modes faster through centralized feedback loops. - **Context Window Economics:** Context windows have plateaued near one million tokens because serving costs become prohibitive at scale — a single one-million-token request can cost several dollars. Google's direction shifts toward smart context compaction: selectively retrieving relevant information rather than expanding raw window size, which keeps latency and cost manageable while effectively giving models access to much larger information pools. - **Recursive Self-Improvement — Practical, Not Singular:** Google uses Gemini internally to improve Gemini, including running ablations, submitting code changes, and generating research reports autonomously. However, humans remain in the driver's seat on large pre-training runs due to the high compute cost of misdirection. The framing is deep human-AI collaboration rather than autonomous AI-led model development, with humans focused on strategic interpretation of results. → NOTABLE MOMENT A Google safety and alignment researcher ran a full suite of model ablations from her phone while sitting in a hot tub, receiving a complete report within an hour. The anecdote illustrates how AI-assisted research workflows have already compressed tasks that previously required days of engineering time into single sessions. 💼 SPONSORS [{"name": "Brave Search API", "url": "https://brave.com"}, {"name": "Sequence", "url": "https://sequencehq.com"}, {"name": "RoboFlow", "url": "https://roboflow.com"}, {"name": "Anthropic / Claude", "url": "https://claude.ai/tcr"}] 🏷️ Gemini Models, Agentic AI Infrastructure, AI Product Strategy, Recursive Self-Improvement, Context Window Scaling