Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

March 12, 2026

60 min episode · 3 min read

Retrieval After Rag

Episode

60 min

Read time

3 min

Topics

Design & UX, Software Development, Science & Discovery

AI-Generated Summary

Published Mar 13, 2026

Key Takeaways

✓Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
✓Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
✓Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
✓Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
✓Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.

What It Covers

Simon Eski, founder of Turbopuffer, explains how his search database achieves cost reductions of up to 95% by building entirely on object storage and NVMe SSDs — an architecture made possible only after S3 gained strong consistency in December 2020 and compare-and-swap support in late 2024 — while serving customers like Cursor and Notion at billion-vector scale.

Key Questions Answered

•Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
•Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
•Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
•Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
•Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.
•P99 Engineer Hiring Filter: After every interview, default to rejection and require at least one team member to actively advocate for the candidate. Screen for candidates who have bent software to match theoretical hardware limits — measured by closing the gap between napkin-math QPS ceilings and observed system performance. Turbopuffer's ANN v3 searches 100 billion vectors at p50 of 40ms and p99 of 200ms as a concrete benchmark of this standard.

Notable Moment

Before closing Turbopuffer's seed round, Simon Eski told his only investor prospect that he would return all the money if the product lacked clear market fit by year-end. The investor responded that no founder had ever said that to him — and that transparency became the primary reason Eski chose him over database-specialist investors.

Know someone who'd find this useful?

You just read a 3-minute summary of a 57-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Apr 23 · 54 min

a16z Podcast

Ben Horowitz on Venture Capital and AI

Apr 27

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

Apr 22 · 72 min

Up First (NPR)

White House Response To Shooting, Shooter Investigation, King Charles State Visit

Apr 27

Similar Episodes

Related episodes from other podcasts

a16z Podcast

Apr 27

Explore Related Topics

🎨Design & UX 💻Software Development 🔬Science & Discovery

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Software Engineering Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Ben Horowitz on Venture Capital and AI

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

White House Response To Shooting, Shooter Investigation, King Charles State Visit

More from Latent Space

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

Similar Episodes

Ben Horowitz on Venture Capital and AI

White House Response To Shooting, Shooter Investigation, King Charles State Visit

Why International Stocks Are Beating the S&P + How Scott Invests his Money

🏈 “Endorse My Ball” — Fernando Mendoza’s LinkedIn-ing. Intel’s chip-rip-dip. The Vatican’s AI savior. +Uber Spy Pricing

Premium and affordable products are having a moment

Explore Related Topics

You're clearly into Latent Space.