The Inventors of Deep Research
Episode
61 min
Read time
2 min
Topics
Remote Work, Investing, Fundraising & VC
AI-Generated Summary
Key Takeaways
- ✓Editable Research Plans: Deep Research generates an upfront research plan showing exactly how it will break down the query before starting. Users can edit this plan conversationally or via button, though most hit start immediately. This transparency mechanism helps users understand the approach even when they don't engage, addressing the challenge of spending five minutes on potentially misaligned research directions.
- ✓Custom Post-Training Required: The team built a specialized fine-tuned version of Gemini 1.5 Pro specifically for deep research, not just the base model. This post-training work teaches iterative planning across domains without overfitting per vertical. The challenge involves balancing new capabilities while preserving pre-training knowledge, using data augmentation techniques to maintain generalizability across their research ontology.
- ✓Asynchronous Orchestration Platform: Google built a new async engine enabling users to close computers and receive notifications when research completes. The system maintains state, handles retries on failures, and manages hundreds of LLM calls reliably. This differs from previous synchronous chat interactions and resembles workflow systems like Temporal or Apache Airflow but optimized for multi-minute agent jobs.
- ✓Context Over RAG Strategy: Deep Research keeps all browsed websites in the full context window (up to two million tokens) rather than using retrieval augmented generation. RAG struggles when queries have multiple attributes since cosine similarity doesn't work well. The team only falls back to RAG when context exceeds limits or for conversations beyond 10 turns ago, prioritizing recent research for complex follow-up questions.
- ✓Ontology-Based Evaluation: Instead of vertical-specific benchmarks, the team developed a research behavior ontology spanning broad-shallow queries (like finding summer camps) to narrow-deep investigations. They combine automated metrics (plan length, iteration steps, time distribution) with human evaluation on comprehensiveness and groundedness. Standard benchmarks don't translate to product experience since text output entropy makes verification challenging.
What It Covers
Arash Selvan and Mukund Sridhar, the PM and tech lead behind Gemini Deep Research, explain how they built the original deep research agent category. They cover the technical architecture, including custom fine-tuned models, asynchronous orchestration systems, iterative planning mechanisms, and evaluation strategies for five-minute autonomous research tasks that browse dozens of websites.
Key Questions Answered
- •Editable Research Plans: Deep Research generates an upfront research plan showing exactly how it will break down the query before starting. Users can edit this plan conversationally or via button, though most hit start immediately. This transparency mechanism helps users understand the approach even when they don't engage, addressing the challenge of spending five minutes on potentially misaligned research directions.
- •Custom Post-Training Required: The team built a specialized fine-tuned version of Gemini 1.5 Pro specifically for deep research, not just the base model. This post-training work teaches iterative planning across domains without overfitting per vertical. The challenge involves balancing new capabilities while preserving pre-training knowledge, using data augmentation techniques to maintain generalizability across their research ontology.
- •Asynchronous Orchestration Platform: Google built a new async engine enabling users to close computers and receive notifications when research completes. The system maintains state, handles retries on failures, and manages hundreds of LLM calls reliably. This differs from previous synchronous chat interactions and resembles workflow systems like Temporal or Apache Airflow but optimized for multi-minute agent jobs.
- •Context Over RAG Strategy: Deep Research keeps all browsed websites in the full context window (up to two million tokens) rather than using retrieval augmented generation. RAG struggles when queries have multiple attributes since cosine similarity doesn't work well. The team only falls back to RAG when context exceeds limits or for conversations beyond 10 turns ago, prioritizing recent research for complex follow-up questions.
- •Ontology-Based Evaluation: Instead of vertical-specific benchmarks, the team developed a research behavior ontology spanning broad-shallow queries (like finding summer camps) to narrow-deep investigations. They combine automated metrics (plan length, iteration steps, time distribution) with human evaluation on comprehensiveness and groundedness. Standard benchmarks don't translate to product experience since text output entropy makes verification challenging.
- •Counterintuitive Latency Preferences: Users actually value longer research times, contrary to all Google product orthodoxy where latency improvements always increased satisfaction and retention. The team initially worried about five-minute waits and built a hard 10-minute limit, but users appreciate visible work being done across 30-70 websites. This inverts traditional product metrics where faster always performed better.
Notable Moment
The team discovered users suspected they were artificially inflating wait times when investor Jason Calacanis asked if they generated answers in 10 seconds then made users wait. This completely contradicted their assumptions since every Google product historically showed latency improvements drove all other metrics up, leading them to initially build both five-minute and 15-minute hardcore versions.
You just read a 3-minute summary of a 58-minute episode.
Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Latent Space
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Jun 4 · 75 min
Pivot
Trump's AI Stake, SpaceX's IPO Froth, and Apple's Siri Overhaul
Jun 9
More from Latent Space
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
Jun 3 · 93 min
My First Million
#1 Habit Expert: Here's how you become dramatically better
Apr 16
More from Latent Space
We summarize every new episode. Want them in your inbox?
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build
GitHub's plan for Agents — Kyle Daigle, GitHub
Why Video Agent models are next — Ethan He, xAI Grok Imagine
Similar Episodes
Related episodes from other podcasts
Pivot
Jun 9
Trump's AI Stake, SpaceX's IPO Froth, and Apple's Siri Overhaul
My First Million
Apr 16
#1 Habit Expert: Here's how you become dramatically better
The TWIML AI Podcast
Oct 22
Vibe Coding's Uncanny Valley with Alexandre Pesant - #752
Accidental Tech Podcast
Jun 9
695: The Crystal Pepsi of Aqua
Modern Wisdom
Jun 8
The Art of Unstoppable Self-Belief - Joe Santagato - #1108
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Latent Space.
Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime