Turbopuffer with Simon Hørup Eskildsen

September 30, 2025

50 min episode · 2 min read

Simon H

Episode

50 min

Read time

2 min

AI-Generated Summary

Published Dec 25, 2025

Key Takeaways

✓Storage architecture economics: TurboPuffer uses S3 object storage at 2¢ per gigabyte versus traditional in-memory vector databases at $2-5 per gigabyte, achieving 100x cost reduction while maintaining sub-second query performance through strategic caching layers.
✓Cluster-based indexing for disk: Graph-based vector indexes require hundreds of milliseconds per jump on S3, making them impractable. Cluster-based indexes fetch centroids and clusters in just three round trips, enabling cold queries under one second on object storage.
✓Production recall monitoring: TurboPuffer samples 1% of production queries to measure recall accuracy against exact results, maintaining 90-95% recall across real-world datasets. This catches edge cases that academic benchmarks miss, ensuring consistent search quality at scale.
✓Namespace sharding primitive: TurboPuffer maps each namespace to one shard with separate S3 prefixes, supporting over 100 million namespaces. Each namespace can use customer-managed encryption keys, providing isolation equivalent to separate buckets without coordination overhead.

What It Covers

Simon Eskildsen explains how TurboPuffer reduces vector database costs by 95% using object storage instead of memory, enabling companies like Cursor and Notion to scale AI search economically at 2¢ per gigabyte.

Key Questions Answered

•Storage architecture economics: TurboPuffer uses S3 object storage at 2¢ per gigabyte versus traditional in-memory vector databases at $2-5 per gigabyte, achieving 100x cost reduction while maintaining sub-second query performance through strategic caching layers.
•Cluster-based indexing for disk: Graph-based vector indexes require hundreds of milliseconds per jump on S3, making them impractable. Cluster-based indexes fetch centroids and clusters in just three round trips, enabling cold queries under one second on object storage.
•Production recall monitoring: TurboPuffer samples 1% of production queries to measure recall accuracy against exact results, maintaining 90-95% recall across real-world datasets. This catches edge cases that academic benchmarks miss, ensuring consistent search quality at scale.
•Namespace sharding primitive: TurboPuffer maps each namespace to one shard with separate S3 prefixes, supporting over 100 million namespaces. Each namespace can use customer-managed encryption keys, providing isolation equivalent to separate buckets without coordination overhead.

Notable Moment

Eskildsen discovered the vector database cost problem when calculating that storing Readwise article embeddings would cost $30,000 monthly versus $3,000 for their entire Postgres database, revealing a 10x cost amplification blocking AI feature adoption.

Know someone who'd find this useful?