Skip to main content
Latent Space

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

60 min episode · 3 min read
·

Episode

60 min

Read time

3 min

Topics

Design & UX, Software Development, Science & Discovery

AI-Generated Summary

Key Takeaways

  • Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
  • Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
  • Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
  • Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
  • Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.

What It Covers

Simon Eski, founder of Turbopuffer, explains how his search database achieves cost reductions of up to 95% by building entirely on object storage and NVMe SSDs — an architecture made possible only after S3 gained strong consistency in December 2020 and compare-and-swap support in late 2024 — while serving customers like Cursor and Notion at billion-vector scale.

Key Questions Answered

  • Object Storage Architecture: Build databases entirely on S3 with zero consensus layer by leveraging S3's strong consistency (available December 2020) and compare-and-swap (late 2024). This eliminates Zookeeper-style coordination overhead entirely. Data survives complete server shutdown because nothing persists locally. The tradeoff is ~100ms write latency, but read performance scales cheaply by inflating only 5-10% of data into NVMe and less into DRAM.
  • Database Company Prerequisites: Three conditions must align to build a generational database company: a new workload forcing every company to adopt your system, a new storage architecture legacy vendors cannot retrofit, and a commitment to implementing every query plan over time. The AI-to-data connection workload, NVMe SSDs in cloud (2017), and S3 consistency (2020) created this window for Turbopuffer specifically.
  • Agent Search Patterns: RAG has shifted from single context-window queries to agents firing massive parallel searches simultaneously — Notion executes a high volume of concurrent queries per round trip. This changes pricing strategy: Turbopuffer is reducing query pricing by 5x to accommodate workloads where one agent session generates hundreds of search calls rather than a handful.
  • Hybrid Search Is Non-Negotiable: All production search workloads require combining vector, full-text, and regex search simultaneously. A pure embedding search misidentifies "SI" as Spanish for "yes" rather than a document prefix. Cursor supplements semantic search with grep. Turbopuffer now beats Lucene on long-query benchmarks — the query type most common when LLMs generate or augment search strings against web-scale datasets.
  • Vibe Pricing From First Principles: Calculate infrastructure cost from raw hardware napkin math — bandwidth, IOPS, storage per terabyte — then add margin. Turbopuffer's early pricing was set this way before achieving margin, forcing aggressive optimization when Cursor's bill grew faster than revenue. Running on a personal credit card created direct pressure to make the unit economics work before raising outside capital.
  • P99 Engineer Hiring Filter: After every interview, default to rejection and require at least one team member to actively advocate for the candidate. Screen for candidates who have bent software to match theoretical hardware limits — measured by closing the gap between napkin-math QPS ceilings and observed system performance. Turbopuffer's ANN v3 searches 100 billion vectors at p50 of 40ms and p99 of 200ms as a concrete benchmark of this standard.

Notable Moment

Before closing Turbopuffer's seed round, Simon Eski told his only investor prospect that he would return all the money if the product lacked clear market fit by year-end. The investor responded that no founder had ever said that to him — and that transparency became the primary reason Eski chose him over database-specialist investors.

Know someone who'd find this useful?

You just read a 3-minute summary of a 57-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Latent Space

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Software Engineering Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime