Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer
Episode
60 min
Read time
3 min
Topics
Design & UX, Software Development, Science & Discovery
AI-Generated Summary
Key Takeaways
- ✓Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
- ✓Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
- ✓Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
- ✓Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
- ✓Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.
What It Covers
Simon Eski, founder of Turbopuffer, explains how his search database achieves cost reductions of up to 95% by building entirely on object storage and NVMe SSDs — an architecture made possible only after S3 gained strong consistency in December 2020 and compare-and-swap support in late 2024 — while serving customers like Cursor and Notion at billion-vector scale.
Key Questions Answered
- •Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
- •Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
- •Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
- •Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
- •Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.
- •P99 Engineer Hiring Filter: After every interview, default to rejection and require at least one team member to actively advocate for the candidate. Screen for candidates who have bent software to match theoretical hardware limits — measured by closing the gap between napkin-math QPS ceilings and observed system performance. Turbopuffer's ANN v3 searches 100 billion vectors at p50 of 40ms and p99 of 200ms as a concrete benchmark of this standard.
Notable Moment
Before closing Turbopuffer's seed round, Simon Eski told his only investor prospect that he would return all the money if the product lacked clear market fit by year-end. The investor responded that no founder had ever said that to him — and that transparency became the primary reason Eski chose him over database-specialist investors.
You just read a 3-minute summary of a 57-minute episode.
Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Latent Space
AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Apr 23 · 54 min
a16z Podcast
Ben Horowitz on Venture Capital and AI
Apr 27
More from Latent Space
Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO
Apr 22 · 72 min
Up First (NPR)
White House Response To Shooting, Shooter Investigation, King Charles State Visit
Apr 27
More from Latent Space
We summarize every new episode. Want them in your inbox?
AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO
🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik
Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion
Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony
Similar Episodes
Related episodes from other podcasts
a16z Podcast
Apr 27
Ben Horowitz on Venture Capital and AI
Up First (NPR)
Apr 27
White House Response To Shooting, Shooter Investigation, King Charles State Visit
The Prof G Pod
Apr 27
Why International Stocks Are Beating the S&P + How Scott Invests his Money
Snacks Daily
Apr 27
🏈 “Endorse My Ball” — Fernando Mendoza’s LinkedIn-ing. Intel’s chip-rip-dip. The Vatican’s AI savior. +Uber Spy Pricing
The Indicator
Apr 27
Premium and affordable products are having a moment
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Software Engineering Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Latent Space.
Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime