Retrieval After Rag

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Mar 12, 202661 min

AI Summary

→ WHAT IT COVERS Simon Eski, founder of Turbopuffer, explains how his search database achieves cost reductions of up to 95% by building entirely on object storage and NVMe SSDs — an architecture made possible only after S3 gained strong consistency in December 2020 and compare-and-swap support in late 2024 — while serving customers like Cursor and Notion at billion-vector scale. → KEY INSIGHTS - **Object Storage Architecture:** Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM. - **Database Company Prerequisites:** Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically. - **Agent Search Patterns:** RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful. - **Hybrid Search Is Non-Negotiable:** All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets. - **Vibe Pricing From First Principles:** Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital. - **P99 Engineer Hiring Filter:** After every interview, default to rejection and require at least one team member to actively advocate for the candidate. Screen for candidates who have bent software to match theoretical hardware limits — measured by closing the gap between napkin-math QPS ceilings and observed system performance. Turbopuffer's ANN v3 searches 100 billion vectors at p50 of 40ms and p99 of 200ms as a concrete benchmark of this standard. → NOTABLE MOMENT Before closing Turbopuffer's seed round, Simon Eski told his only investor prospect that he would return all the money if the product lacked clear market fit by year-end. The investor responded that no founder had ever said that to him — and that transparency became the primary reason Eski chose him over database-specialist investors. 💼 SPONSORS None detected 🏷️ Vector Search, Hybrid Search, Database Architecture, AI Agents, RAG, Object Storage

Read Full Summary Listen

Featured On 1 Podcast

Latent Space

All Appearances

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

AI Summary

Never miss Retrieval After Rag's insights