The Inventors of Deep Research
Episode
61 min
Read time
2 min
Topics
Science & Discovery
AI-Generated Summary
Key Takeaways
- ✓Editable Research Plans: Deep Research generates an upfront research plan showing exactly how it will break down the query before starting. Users can edit this plan conversationally or via button, though most hit start immediately. This transparency mechanism helps users understand the approach even when they don't engage, addressing the challenge of spending five minutes on potentially misaligned research directions.
- ✓Custom Post-Training Required: The team built a specialized fine-tuned version of Gemini 1.5 Pro specifically for deep research, not just the base model. This post-training work teaches iterative planning across domains without overfitting per vertical. The challenge involves balancing new capabilities while preserving pre-training knowledge, using data augmentation techniques to maintain generalizability across their research ontology.
- ✓Asynchronous Orchestration Platform: Google built a new async engine enabling users to close computers and receive notifications when research completes. The system maintains state, handles retries on failures, and manages hundreds of LLM calls reliably. This differs from previous synchronous chat interactions and resembles workflow systems like Temporal or Apache Airflow but optimized for multi-minute agent jobs.
- ✓Context Over RAG Strategy: Deep Research keeps all browsed websites in the full context window (up to two million tokens) rather than using retrieval augmented generation. RAG struggles when queries have multiple attributes since cosine similarity doesn't work well. The team only falls back to RAG when context exceeds limits or for conversations beyond 10 turns ago, prioritizing recent research for complex follow-up questions.
- ✓Ontology-Based Evaluation: Instead of vertical-specific benchmarks, the team developed a research behavior ontology spanning broad-shallow queries (like finding summer camps) to narrow-deep investigations. They combine automated metrics (plan length, iteration steps, time distribution) with human evaluation on comprehensiveness and groundedness. Standard benchmarks don't translate to product experience since text output entropy makes verification challenging.
What It Covers
Arash Selvan and Mukund Sridhar, the PM and tech lead behind Gemini Deep Research, explain how they built the original deep research agent category. They cover the technical architecture, including custom fine-tuned models, asynchronous orchestration systems, iterative planning mechanisms, and evaluation strategies for five-minute autonomous research tasks that browse dozens of websites.
Key Questions Answered
- •Editable Research Plans: Deep Research generates an upfront research plan showing exactly how it will break down the query before starting. Users can edit this plan conversationally or via button, though most hit start immediately. This transparency mechanism helps users understand the approach even when they don't engage, addressing the challenge of spending five minutes on potentially misaligned research directions.
- •Custom Post-Training Required: The team built a specialized fine-tuned version of Gemini 1.5 Pro specifically for deep research, not just the base model. This post-training work teaches iterative planning across domains without overfitting per vertical. The challenge involves balancing new capabilities while preserving pre-training knowledge, using data augmentation techniques to maintain generalizability across their research ontology.
- •Asynchronous Orchestration Platform: Google built a new async engine enabling users to close computers and receive notifications when research completes. The system maintains state, handles retries on failures, and manages hundreds of LLM calls reliably. This differs from previous synchronous chat interactions and resembles workflow systems like Temporal or Apache Airflow but optimized for multi-minute agent jobs.
- •Context Over RAG Strategy: Deep Research keeps all browsed websites in the full context window (up to two million tokens) rather than using retrieval augmented generation. RAG struggles when queries have multiple attributes since cosine similarity doesn't work well. The team only falls back to RAG when context exceeds limits or for conversations beyond 10 turns ago, prioritizing recent research for complex follow-up questions.
- •Ontology-Based Evaluation: Instead of vertical-specific benchmarks, the team developed a research behavior ontology spanning broad-shallow queries (like finding summer camps) to narrow-deep investigations. They combine automated metrics (plan length, iteration steps, time distribution) with human evaluation on comprehensiveness and groundedness. Standard benchmarks don't translate to product experience since text output entropy makes verification challenging.
- •Counterintuitive Latency Preferences: Users actually value longer research times, contrary to all Google product orthodoxy where latency improvements always increased satisfaction and retention. The team initially worried about five-minute waits and built a hard 10-minute limit, but users appreciate visible work being done across 30-70 websites. This inverts traditional product metrics where faster always performed better.
Notable Moment
The team discovered users suspected they were artificially inflating wait times when investor Jason Calacanis asked if they generated answers in 10 seconds then made users wait. This completely contradicted their assumptions since every Google product historically showed latency improvements drove all other metrics up, leading them to initially build both five-minute and 15-minute hardcore versions.
You just read a 3-minute summary of a 58-minute episode.
Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Latent Space
Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition
Apr 27 · 72 min
Morning Brew Daily
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
Apr 30
More from Latent Space
AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Apr 23 · 54 min
a16z Podcast
Workday’s Last Workday? AI and the Future of Enterprise Software
Apr 30
More from Latent Space
We summarize every new episode. Want them in your inbox?
Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition
AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO
🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik
Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion
Similar Episodes
Related episodes from other podcasts
Morning Brew Daily
Apr 30
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
a16z Podcast
Apr 30
Workday’s Last Workday? AI and the Future of Enterprise Software
Masters of Scale
Apr 30
How Poppi’s founders built a new soda brand worth $2 billion
Snacks Daily
Apr 30
🦸♀️ “MAMA Stocks” — Zuck’s Ad/AI machine. Hilary Duff’s anti-Ozempic bet. Bill Ackman’s Influencer IPO. +Refresher surge
The Mel Robbins Podcast
Apr 30
Eat This to Live Longer, Stay Young, and Transform Your Health
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into Latent Space.
Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime