Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer
Episode
60 min
Read time
3 min
Topics
Career Growth, Health & Wellness, Startups
AI-Generated Summary
Key Takeaways
- ✓Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
- ✓Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
- ✓Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
- ✓Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
- ✓Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.
What It Covers
Simon Eski, founder of Turbopuffer, explains how his search database achieves cost reductions of up to 95% by building entirely on object storage and NVMe SSDs — an architecture made possible only after S3 gained strong consistency in December 2020 and compare-and-swap support in late 2024 — while serving customers like Cursor and Notion at billion-vector scale.
Key Questions Answered
- •Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
- •Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
- •Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
- •Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
- •Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.
- •P99 Engineer Hiring Filter: After every interview, default to rejection and require at least one team member to actively advocate for the candidate. Screen for candidates who have bent software to match theoretical hardware limits — measured by closing the gap between napkin-math QPS ceilings and observed system performance. Turbopuffer's ANN v3 searches 100 billion vectors at p50 of 40ms and p99 of 200ms as a concrete benchmark of this standard.
Notable Moment
Before closing Turbopuffer's seed round, Simon Eski told his only investor prospect that he would return all the money if the product lacked clear market fit by year-end. The investor responded that no founder had ever said that to him — and that transparency became the primary reason Eski chose him over database-specialist investors.
You just read a 3-minute summary of a 57-minute episode.
Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Latent Space
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Jun 4 · 75 min
Software Engineering Daily
Turbopuffer with Simon Hørup Eskildsen
Sep 30
More from Latent Space
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
Jun 3 · 93 min
Practical AI
AI incidents, audits, and the limits of benchmarks
Feb 13
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
company
“Simon Eski, founder of Turbopuffer, explains how his search database achieves cost reductions of up to 95% by building entirely on object storage and NVMe SSDs”
“Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful. This changes pricing strategy for customers like Cursor and Notion at billion-vector scale.”
“while serving customers like Cursor and Notion at billion-vector scale.”
More from Latent Space
We summarize every new episode. Want them in your inbox?
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build
GitHub's plan for Agents — Kyle Daigle, GitHub
Why Video Agent models are next — Ethan He, xAI Grok Imagine
Similar Episodes
Related episodes from other podcasts
Software Engineering Daily
Sep 30
Turbopuffer with Simon Hørup Eskildsen
Practical AI
Feb 13
AI incidents, audits, and the limits of benchmarks
This Week in Startups
May 27
The Drone Company Quietly Taking Over Delivery
Practical AI
May 21
Hermes Agent: Agents that grow with you
My First Million
Apr 7
We asked a $15B Investor how to survive the AI bubble
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Latent Space.
Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime