DeepMind’s RAG System with Animesh Chatterji and Ivan Solovyev

March 12, 2026

37 min episode · 2 min read

Animesh Chatterji

Episode

37 min

Read time

2 min

AI-Generated Summary

Published Mar 12, 2026

Key Takeaways

✓Embedding model quality drives 80% of RAG outcomes: Configuration tuning — chunk size, overlap, re-ranking — accounts for roughly 20% of retrieval quality. The embedding model itself determines the majority of results. DeepMind recommends developers first swap in Gemini's embedding model on their existing pipeline before migrating fully to FileSearch, as this single change yields the largest quality improvement.
✓Simplified pricing model: pay only for indexing and tokens: FileSearch eliminates separate charges for storage, infrastructure, and inference. Developers pay once at upload time for embedding and indexing, then pay only for tokens during queries. For large enterprise datasets, this structure reduces costs substantially compared to multi-component billing models common across competing RAG pipeline providers.
✓Optimal retrieval sits at low double-digit chunk counts: FileSearch returns roughly five chunks per query by default, a threshold validated across legal document search and code completion use cases with partners. The system applies a quality score cutoff to filter low-relevance chunks rather than re-ranking, since re-ranking added complexity without measurable retrieval quality improvement in internal evaluations.
✓Fine-tuning embedding models becomes obsolete within six months: DeepMind's general recommendation discourages fine-tuning because base model improvements — typically 15% gains across benchmarks — arrive faster than custom fine-tuning cycles complete. The only justified exception is a highly niche dataset that major labs are unlikely to address. Developers should factor this deprecation timeline into any fine-tuning investment decision.
✓Matryoshka embedding representation enables storage-quality trade-offs: Gemini's latest embedding model encodes vectors so that the leading dimensions carry the highest semantic density. Developers can truncate a 3,000-dimension vector at any point and retain a usable representation, reducing storage costs at a controlled quality cost. FileSearch may expose this truncation threshold as a configurable API parameter in a future release.

What It Covers

Google DeepMind engineers Animesh Chatterji and Ivan Solovyev explain how FileSearch, a fully managed RAG tool built into the Gemini API, abstracts away vector databases, chunking, and indexing infrastructure, while covering embedding model improvements, pricing simplification, retrieval quality trade-offs, and the road toward multimodal retrieval support.

Key Questions Answered

•Embedding model quality drives 80% of RAG outcomes: Configuration tuning — chunk size, overlap, re-ranking — accounts for roughly 20% of retrieval quality. The embedding model itself determines the majority of results. DeepMind recommends developers first swap in Gemini's embedding model on their existing pipeline before migrating fully to FileSearch, as this single change yields the largest quality improvement.
•Simplified pricing model: pay only for indexing and tokens: FileSearch eliminates separate charges for storage, infrastructure, and inference. Developers pay once at upload time for embedding and indexing, then pay only for tokens during queries. For large enterprise datasets, this structure reduces costs substantially compared to multi-component billing models common across competing RAG pipeline providers.
•Optimal retrieval sits at low double-digit chunk counts: FileSearch returns roughly five chunks per query by default, a threshold validated across legal document search and code completion use cases with partners. The system applies a quality score cutoff to filter low-relevance chunks rather than re-ranking, since re-ranking added complexity without measurable retrieval quality improvement in internal evaluations.
•Fine-tuning embedding models becomes obsolete within six months: DeepMind's general recommendation discourages fine-tuning because base model improvements — typically 15% gains across benchmarks — arrive faster than custom fine-tuning cycles complete. The only justified exception is a highly niche dataset that major labs are unlikely to address. Developers should factor this deprecation timeline into any fine-tuning investment decision.
•Matryoshka embedding representation enables storage-quality trade-offs: Gemini's latest embedding model encodes vectors so that the leading dimensions carry the highest semantic density. Developers can truncate a 3,000-dimension vector at any point and retain a usable representation, reducing storage costs at a controlled quality cost. FileSearch may expose this truncation threshold as a configurable API parameter in a future release.

Notable Moment

The team pushed back on the widespread claim that agents make RAG obsolete, arguing the framing misunderstands what RAG is. They noted RAG is itself a tool given to agents, and that personalization features at DeepMind are now being implemented using RAG-style chunk retrieval mechanisms.

Know someone who'd find this useful?