DeepMind’s RAG System with Animesh Chatterji and Ivan Solovyev
Episode
37 min
Read time
2 min
Topics
Relationships, Investing, Fundraising & VC
AI-Generated Summary
Key Takeaways
- ✓Embedding model quality drives 80% of RAG outcomes: Configuration tuning — chunk size, overlap, re-ranking — accounts for roughly 20% of retrieval quality. The embedding model itself determines the majority of results. DeepMind recommends developers first swap in Gemini's embedding model on their existing pipeline before migrating fully to FileSearch, as this single change yields the largest quality improvement.
- ✓Simplified pricing model: pay only for indexing and tokens: FileSearch eliminates separate charges for storage, infrastructure, and inference. Developers pay once at upload time for embedding and indexing, then pay only for tokens during queries. For large enterprise datasets, this structure reduces costs substantially compared to multi-component billing models common across competing RAG pipeline providers.
- ✓Optimal retrieval sits at low double-digit chunk counts: FileSearch returns roughly five chunks per query by default, a threshold validated across legal document search and code completion use cases with partners. The system applies a quality score cutoff to filter low-relevance chunks rather than re-ranking, since re-ranking added complexity without measurable retrieval quality improvement in internal evaluations.
- ✓Fine-tuning embedding models becomes obsolete within six months: DeepMind's general recommendation discourages fine-tuning because base model improvements — typically 15% gains across benchmarks — arrive faster than custom fine-tuning cycles complete. The only justified exception is a highly niche dataset that major labs are unlikely to address. Developers should factor this deprecation timeline into any fine-tuning investment decision.
- ✓Matryoshka embedding representation enables storage-quality trade-offs: Gemini's latest embedding model encodes vectors so that the leading dimensions carry the highest semantic density. Developers can truncate a 3,000-dimension vector at any point and retain a usable representation, reducing storage costs at a controlled quality cost. FileSearch may expose this truncation threshold as a configurable API parameter in a future release.
What It Covers
Google DeepMind engineers Animesh Chatterji and Ivan Solovyev explain how FileSearch, a fully managed RAG tool built into the Gemini API, abstracts away vector databases, chunking, and indexing infrastructure, while covering embedding model improvements, pricing simplification, retrieval quality trade-offs, and the road toward multimodal retrieval support.
Key Questions Answered
- •Embedding model quality drives 80% of RAG outcomes: Configuration tuning — chunk size, overlap, re-ranking — accounts for roughly 20% of retrieval quality. The embedding model itself determines the majority of results. DeepMind recommends developers first swap in Gemini's embedding model on their existing pipeline before migrating fully to FileSearch, as this single change yields the largest quality improvement.
- •Simplified pricing model: pay only for indexing and tokens: FileSearch eliminates separate charges for storage, infrastructure, and inference. Developers pay once at upload time for embedding and indexing, then pay only for tokens during queries. For large enterprise datasets, this structure reduces costs substantially compared to multi-component billing models common across competing RAG pipeline providers.
- •Optimal retrieval sits at low double-digit chunk counts: FileSearch returns roughly five chunks per query by default, a threshold validated across legal document search and code completion use cases with partners. The system applies a quality score cutoff to filter low-relevance chunks rather than re-ranking, since re-ranking added complexity without measurable retrieval quality improvement in internal evaluations.
- •Fine-tuning embedding models becomes obsolete within six months: DeepMind's general recommendation discourages fine-tuning because base model improvements — typically 15% gains across benchmarks — arrive faster than custom fine-tuning cycles complete. The only justified exception is a highly niche dataset that major labs are unlikely to address. Developers should factor this deprecation timeline into any fine-tuning investment decision.
- •Matryoshka embedding representation enables storage-quality trade-offs: Gemini's latest embedding model encodes vectors so that the leading dimensions carry the highest semantic density. Developers can truncate a 3,000-dimension vector at any point and retain a usable representation, reducing storage costs at a controlled quality cost. FileSearch may expose this truncation threshold as a configurable API parameter in a future release.
Notable Moment
The team pushed back on the widespread claim that agents make RAG obsolete, arguing the framing misunderstands what RAG is. They noted RAG is itself a tool given to agents, and that personalization features at DeepMind are now being implemented using RAG-style chunk retrieval mechanisms.
You just read a 3-minute summary of a 34-minute episode.
Get Software Engineering Daily summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Software Engineering Daily
SED News: Apple’s AI Problem, The Real Business Model of AI, and Token Cost Reckoning
Jun 9 · 48 min
Dwarkesh Podcast
Eric Jang – Building AlphaGo from scratch
May 15
More from Software Engineering Daily
Web Native Game Development
Jun 4 · 54 min
The Prof G Pod
First Time Founders: Is Cohere the Next AI Powerhouse?
Mar 1
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
by Google DeepMind
“Google DeepMind engineers Animesh Chatterji and Ivan Solovyev explain how FileSearch, a fully managed RAG tool built into the Gemini API, abstracts away vector databases, chunking, and indexing infrastructure.”
“SPONSORS: Recall.ai”
by Google DeepMind
“FileSearch, a fully managed RAG tool built into the Gemini API, abstracts away vector databases, chunking, and indexing infrastructure.”
company
“SPONSORS: GuardSquare”
“SPONSORS: Fidelity”
More from Software Engineering Daily
We summarize every new episode. Want them in your inbox?
SED News: Apple’s AI Problem, The Real Business Model of AI, and Token Cost Reckoning
Web Native Game Development
The Hardware Bottleneck AI Can’t Fix
Autonomous Drone Delivery at Scale
The European Startup Scene
Similar Episodes
Related episodes from other podcasts
Dwarkesh Podcast
May 15
Eric Jang – Building AlphaGo from scratch
The Prof G Pod
Mar 1
First Time Founders: Is Cohere the Next AI Powerhouse?
Latent Space
Feb 12
Owning the AI Pareto Frontier — Jeff Dean
a16z Podcast
Oct 28
Google DeepMind Developers: How Nano Banana Was Made
a16z Podcast
Jun 6
Building Search for AI Agents with Exa CEO Will Bryk
Explore Related Topics
This podcast is featured in Best Cybersecurity Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Software Engineering Daily.
Every Monday, we deliver AI summaries of the latest episodes from Software Engineering Daily and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime