What are the key takeaways from this Cognitive Revolution episode?

Key insights include: **Model-Harness Co-training:** Google now trains Gemini models in direct partnership with its "anti-gravity" agent harness, meaning the model is optimized for tool-calling loops, orchestration, and agentic workflows from the start. Teams building on top inherit this infrastructure without rebuilding it, cutting iteration cycles from months to days and enabling faster product launches across all Google surfaces.; **Flash-First Strategy:** Google deliberately prioritizes cost-adjusted performance over raw capability maximization. Gemini 3.5 Flash benchmarks at roughly 280 tokens per second on Artificial Analysis, runs three times faster than comparable large models, and costs significantly less. For consumer-scale products like Search and the Gemini app, latency improvements outperform quality gains in live experiments when users are unwilling to wait.; **"Model Eats the Scaffolding" Cycle:** Every 12–18 months, the surrounding scaffolding that developers build around AI models gets absorbed into the model itself. Product teams should avoid having every team independently rebuild agentic infrastructure from scratch. Standardizing on a shared harness layer reduces redundant engineering, accelerates deployment, and surfaces model failure modes faster through centralized feedback loops.

What did Logan Kilpatrick and Tulsee Doshi discuss on Cognitive Revolution?

Google DeepMind's Logan Kilpatrick and Tulsee Doshi join host Neil Savage at Google HQ ahead of Google IO to discuss Gemini 3.5 Flash, the Omni video generation model, the Agent harness infrastructure, recursive self-improvement, context window limits, and Google's overall AI product strategy across its billions-of-users product surface. Key topics include: **Model-Harness Co-training:** Google now trains Gemini models in direct partnership with its "anti-gravity" agent harness, meaning the model is optimized for tool-calling loops, orchestration, and agentic workflows from the start. Teams building on top inherit this infrastructure without rebuilding it, cutting iteration cycles from months to days and enabling faster product launches across all Google surfaces.; **Flash-First Strategy:** Google deliberately prioritizes cost-adjusted performance over raw capability maximization. Gemini 3.5 Flash benchmarks at roughly 280 tokens per second on Artificial Analysis, runs three times faster than comparable large models, and costs significantly less. For consumer-scale products like Search and the Gemini app, latency improvements outperform quality gains in live experiments when users are unwilling to wait..

How long is this episode of Cognitive Revolution?

This episode is 59 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Cognitive Revolution

The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

May 20, 2026

59 min episode · 2 min read

Logan Kilpatrick,Tulsee Doshi

Episode

59 min

Read time

2 min

Topics

Productivity, Relationships, Fundraising & VC

AI-Generated Summary

Published May 20, 2026

Key Takeaways

✓Model-Harness Co-training: Google now trains Gemini models in direct partnership with its "anti-gravity" agent harness, meaning the model is optimized for tool-calling loops, orchestration, and agentic workflows from the start. Teams building on top inherit this infrastructure without rebuilding it, cutting iteration cycles from months to days and enabling faster product launches across all Google surfaces.
✓Flash-First Strategy: Google deliberately prioritizes cost-adjusted performance over raw capability maximization. Gemini 3.5 Flash benchmarks at roughly 280 tokens per second on Artificial Analysis, runs three times faster than comparable large models, and costs significantly less. For consumer-scale products like Search and the Gemini app, latency improvements outperform quality gains in live experiments when users are unwilling to wait.
✓"Model Eats the Scaffolding" Cycle: Every 12–18 months, the surrounding scaffolding that developers build around AI models gets absorbed into the model itself. Product teams should avoid having every team independently rebuild agentic infrastructure from scratch. Standardizing on a shared harness layer reduces redundant engineering, accelerates deployment, and surfaces model failure modes faster through centralized feedback loops.
✓Context Window Economics: Context windows have plateaued near one million tokens because serving costs become prohibitive at scale — a single one-million-token request can cost several dollars. Google's direction shifts toward smart context compaction: selectively retrieving relevant information rather than expanding raw window size, which keeps latency and cost manageable while effectively giving models access to much larger information pools.
✓Recursive Self-Improvement — Practical, Not Singular: Google uses Gemini internally to improve Gemini, including running ablations, submitting code changes, and generating research reports autonomously. However, humans remain in the driver's seat on large pre-training runs due to the high compute cost of misdirection. The framing is deep human-AI collaboration rather than autonomous AI-led model development, with humans focused on strategic interpretation of results.

What It Covers

Google DeepMind's Logan Kilpatrick and Tulsee Doshi join host Neil Savage at Google HQ ahead of Google IO to discuss Gemini 3.5 Flash, the Omni video generation model, the Agent harness infrastructure, recursive self-improvement, context window limits, and Google's overall AI product strategy across its billions-of-users product surface.

Key Questions Answered

•Model-Harness Co-training: Google now trains Gemini models in direct partnership with its "anti-gravity" agent harness, meaning the model is optimized for tool-calling loops, orchestration, and agentic workflows from the start. Teams building on top inherit this infrastructure without rebuilding it, cutting iteration cycles from months to days and enabling faster product launches across all Google surfaces.
•Flash-First Strategy: Google deliberately prioritizes cost-adjusted performance over raw capability maximization. Gemini 3.5 Flash benchmarks at roughly 280 tokens per second on Artificial Analysis, runs three times faster than comparable large models, and costs significantly less. For consumer-scale products like Search and the Gemini app, latency improvements outperform quality gains in live experiments when users are unwilling to wait.
•"Model Eats the Scaffolding" Cycle: Every 12–18 months, the surrounding scaffolding that developers build around AI models gets absorbed into the model itself. Product teams should avoid having every team independently rebuild agentic infrastructure from scratch. Standardizing on a shared harness layer reduces redundant engineering, accelerates deployment, and surfaces model failure modes faster through centralized feedback loops.
•Context Window Economics: Context windows have plateaued near one million tokens because serving costs become prohibitive at scale — a single one-million-token request can cost several dollars. Google's direction shifts toward smart context compaction: selectively retrieving relevant information rather than expanding raw window size, which keeps latency and cost manageable while effectively giving models access to much larger information pools.
•Recursive Self-Improvement — Practical, Not Singular: Google uses Gemini internally to improve Gemini, including running ablations, submitting code changes, and generating research reports autonomously. However, humans remain in the driver's seat on large pre-training runs due to the high compute cost of misdirection. The framing is deep human-AI collaboration rather than autonomous AI-led model development, with humans focused on strategic interpretation of results.

Notable Moment

A Google safety and alignment researcher ran a full suite of model ablations from her phone while sitting in a hot tub, receiving a complete report within an hour. The anecdote illustrates how AI-assisted research workflows have already compressed tasks that previously required days of engineering time into single sessions.

Know someone who'd find this useful?

You just read a 3-minute summary of a 56-minute episode.

Get Cognitive Revolution summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links.

Tools

Brave Search API
by Brave
“SPONSORS: Brave Search API”
Claude
by Anthropic
“SPONSORS: Anthropic / Claude”
Gemini 3.5 Flash
by Google DeepMind
“Google DeepMind's Logan Kilpatrick and Tulsee Doshi join host Neil Savage at Google HQ ahead of Google IO to discuss Gemini 3.5 Flash, the Omni video generation model, the Agent harness infrastructure, recursive self-improvement, context window limits, and Google's overall AI product strategy”
Sequence
by Sequence
“SPONSORS: Sequence”
Omni
by Google DeepMind
“Google DeepMind's Logan Kilpatrick and Tulsee Doshi join host Neil Savage at Google HQ ahead of Google IO to discuss Gemini 3.5 Flash, the Omni video generation model, the Agent harness infrastructure, recursive self-improvement, context window limits, and Google's overall AI product strategy”
RoboFlow
by RoboFlow
“SPONSORS: RoboFlow”

Similar Episodes

Related episodes from other podcasts

The Indicator

Jul 9

Explore Related Topics

⚡Productivity 💕Relationships 💰Fundraising & VC

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Cognitive Revolution.

Every Monday, we deliver AI summaries of the latest episodes from Cognitive Revolution and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

Why Google fell behind in the AI race

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering

How AI Learns to Smell with Alex Wiltschko - #771

Books, tools, and gear mentioned in this episode

Tools

More from Cognitive Revolution

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering

AI:AM #4: Cameron on Model Consciousness, Duvenaud's Gradual Disempowerment, swyx's AI-Eng Alpha

The God We Deserve: Nonzero's Robert Wright on AI as Humanity's Ultimate Test

AI:AM #3: Zvi on Fable, the Cases For & Against the Ban, + AI for Math, Logistics & More

Similar Episodes

Why Google fell behind in the AI race

How AI Learns to Smell with Alex Wiltschko - #771

It's still way too hard to switch phones

Cannes Lions’ battle of the brands: Starbucks’ stumble, World Cup ads, and more

Preparing for Q-Day

Explore Related Topics

You're clearly into Cognitive Revolution.