Skip to main content
Latent Space

The Inventors of Deep Research

61 min episode · 2 min read
·

Episode

61 min

Read time

2 min

Topics

Science & Discovery

AI-Generated Summary

Key Takeaways

  • Editable Research Plans: Deep Research generates an upfront research plan showing exactly how it will break down the query before starting. Users can edit this plan conversationally or via button, though most hit start immediately. This transparency mechanism helps users understand the approach even when they don't engage, addressing the challenge of spending five minutes on potentially misaligned research directions.
  • Custom Post-Training Required: The team built a specialized fine-tuned version of Gemini 1.5 Pro specifically for deep research, not just the base model. This post-training work teaches iterative planning across domains without overfitting per vertical. The challenge involves balancing new capabilities while preserving pre-training knowledge, using data augmentation techniques to maintain generalizability across their research ontology.
  • Asynchronous Orchestration Platform: Google built a new async engine enabling users to close computers and receive notifications when research completes. The system maintains state, handles retries on failures, and manages hundreds of LLM calls reliably. This differs from previous synchronous chat interactions and resembles workflow systems like Temporal or Apache Airflow but optimized for multi-minute agent jobs.
  • Context Over RAG Strategy: Deep Research keeps all browsed websites in the full context window (up to two million tokens) rather than using retrieval augmented generation. RAG struggles when queries have multiple attributes since cosine similarity doesn't work well. The team only falls back to RAG when context exceeds limits or for conversations beyond 10 turns ago, prioritizing recent research for complex follow-up questions.
  • Ontology-Based Evaluation: Instead of vertical-specific benchmarks, the team developed a research behavior ontology spanning broad-shallow queries (like finding summer camps) to narrow-deep investigations. They combine automated metrics (plan length, iteration steps, time distribution) with human evaluation on comprehensiveness and groundedness. Standard benchmarks don't translate to product experience since text output entropy makes verification challenging.

What It Covers

Arash Selvan and Mukund Sridhar, the PM and tech lead behind Gemini Deep Research, explain how they built the original deep research agent category. They cover the technical architecture, including custom fine-tuned models, asynchronous orchestration systems, iterative planning mechanisms, and evaluation strategies for five-minute autonomous research tasks that browse dozens of websites.

Key Questions Answered

  • Editable Research Plans: Deep Research generates an upfront research plan showing exactly how it will break down the query before starting. Users can edit this plan conversationally or via button, though most hit start immediately. This transparency mechanism helps users understand the approach even when they don't engage, addressing the challenge of spending five minutes on potentially misaligned research directions.
  • Custom Post-Training Required: The team built a specialized fine-tuned version of Gemini 1.5 Pro specifically for deep research, not just the base model. This post-training work teaches iterative planning across domains without overfitting per vertical. The challenge involves balancing new capabilities while preserving pre-training knowledge, using data augmentation techniques to maintain generalizability across their research ontology.
  • Asynchronous Orchestration Platform: Google built a new async engine enabling users to close computers and receive notifications when research completes. The system maintains state, handles retries on failures, and manages hundreds of LLM calls reliably. This differs from previous synchronous chat interactions and resembles workflow systems like Temporal or Apache Airflow but optimized for multi-minute agent jobs.
  • Context Over RAG Strategy: Deep Research keeps all browsed websites in the full context window (up to two million tokens) rather than using retrieval augmented generation. RAG struggles when queries have multiple attributes since cosine similarity doesn't work well. The team only falls back to RAG when context exceeds limits or for conversations beyond 10 turns ago, prioritizing recent research for complex follow-up questions.
  • Ontology-Based Evaluation: Instead of vertical-specific benchmarks, the team developed a research behavior ontology spanning broad-shallow queries (like finding summer camps) to narrow-deep investigations. They combine automated metrics (plan length, iteration steps, time distribution) with human evaluation on comprehensiveness and groundedness. Standard benchmarks don't translate to product experience since text output entropy makes verification challenging.
  • Counterintuitive Latency Preferences: Users actually value longer research times, contrary to all Google product orthodoxy where latency improvements always increased satisfaction and retention. The team initially worried about five-minute waits and built a hard 10-minute limit, but users appreciate visible work being done across 30-70 websites. This inverts traditional product metrics where faster always performed better.

Notable Moment

The team discovered users suspected they were artificially inflating wait times when investor Jason Calacanis asked if they generated answers in 10 seconds then made users wait. This completely contradicted their assumptions since every Google product historically showed latency improvements drove all other metrics up, leading them to initially build both five-minute and 15-minute hardcore versions.

Know someone who'd find this useful?

You just read a 3-minute summary of a 58-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Latent Space

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime