CodeRabbit and RAG for Code Review with Harjot Gill

June 24, 2025

48 min episode · 2 min read

Harjot Gill

Episode

48 min

Read time

2 min

AI-Generated Summary

Published Dec 25, 2025

Key Takeaways

✓Multi-model architecture: CodeRabbit uses seven to eight different LLMs simultaneously, matching workload to model capabilities—GPT-4o-mini for summarization, o3-mini for deep reasoning—rather than letting users choose models, achieving better price-to-performance ratios than single-model approaches.
✓Sandboxed code navigation: Instead of tool calls or MCPs, CodeRabbit clones repositories into cloud sandboxes where agents execute CLI commands, run AST queries, and perform web searches to validate bugs, pioneering this approach two years before similar tools emerged.
✓Dynamic task decomposition: A root agent breaks code reviews into subtasks delegated to specialized sub-agents, with judge LLMs filtering low-quality inferences based on context quality, preventing hallucinations from reaching users through multi-layer validation before surfacing insights.
✓Context preparation strategy: Reasoning models like Sonnet 3.7 require cleaned, re-ranked context rather than raw RAG stuffing—models overthink and derail with unfiltered data, so CodeRabbit spends significant compute on context cleanup before expensive reasoning model calls.

What It Covers

CodeRabbit CEO Harjot Gill explains how his AI code review platform uses multi-model LLM architecture, sandboxed CLI environments, and dynamic task graphs to review 100,000 developers' code daily with reasoning models like o3-mini.

Key Questions Answered

•Multi-model architecture: CodeRabbit uses seven to eight different LLMs simultaneously, matching workload to model capabilities—GPT-4o-mini for summarization, o3-mini for deep reasoning—rather than letting users choose models, achieving better price-to-performance ratios than single-model approaches.
•Sandboxed code navigation: Instead of tool calls or MCPs, CodeRabbit clones repositories into cloud sandboxes where agents execute CLI commands, run AST queries, and perform web searches to validate bugs, pioneering this approach two years before similar tools emerged.
•Dynamic task decomposition: A root agent breaks code reviews into subtasks delegated to specialized sub-agents, with judge LLMs filtering low-quality inferences based on context quality, preventing hallucinations from reaching users through multi-layer validation before surfacing insights.
•Context preparation strategy: Reasoning models like Sonnet 3.7 require cleaned, re-ranked context rather than raw RAG stuffing—models overthink and derail with unfiltered data, so CodeRabbit spends significant compute on context cleanup before expensive reasoning model calls.

Notable Moment

Gill reveals CodeRabbit deliberately avoids building features where model capabilities fall short, refusing to lower quality standards despite market demand, prioritizing reliability over feature expansion until technology advances sufficiently to maintain their accuracy reputation.

Know someone who'd find this useful?