What are the key takeaways from this Gradient Dissent episode?

Key insights include: **Company pivoting:** Stay lean during market shifts - BaseTen remained 18 people from 2019-2023, enabling rapid pivots when ChatGPT and Stable Diffusion created new opportunities without organizational weight.; **Inference differentiation:** Focus on dedicated capacity over shared endpoints - 99% of BaseTen's business serves custom models with dedicated infrastructure, avoiding commoditized shared model serving markets.; **Technical optimization:** Modern LLM inference requires both infrastructure scaling across thousands of GPUs and runtime optimization using frameworks like VLLM, TensorRT-LLM, and SGLang for performance improvements.

What did Tuhin Srivastava discuss on Gradient Dissent?

BaseTen CEO Tuhin Srivastava explains how his AI inference company pivoted from serving data scientists with small models to becoming fastest-growing inference provider for production applications. Key topics include: **Company pivoting:** Stay lean during market shifts - BaseTen remained 18 people from 2019-2023, enabling rapid pivots when ChatGPT and Stable Diffusion created new opportunities without organizational weight.; **Inference differentiation:** Focus on dedicated capacity over shared endpoints - 99% of BaseTen's business serves custom models with dedicated infrastructure, avoiding commoditized shared model serving markets..

How long is this episode of Gradient Dissent?

This episode is 59 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Gradient Dissent

The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava

November 18, 2025

59 min episode · 2 min read

Tuhin Srivastava

Episode

59 min

Read time

2 min

Topics

Productivity, Startups, Leadership

AI-Generated Summary

Published Dec 21, 2025

Key Takeaways

✓Company pivoting: Stay lean during market shifts - BaseTen remained 18 people from 2019-2023, enabling rapid pivots when ChatGPT and Stable Diffusion created new opportunities without organizational weight.
✓Inference differentiation: Focus on dedicated capacity over shared endpoints - 99% of BaseTen's business serves custom models with dedicated infrastructure, avoiding commoditized shared model serving markets.
✓Technical optimization: Modern LLM inference requires both infrastructure scaling across thousands of GPUs and runtime optimization using frameworks like VLLM, TensorRT-LLM, and SGLang for performance improvements.
✓Market positioning: Open source adoption follows predictable pattern - companies start with Anthropic/OpenAI, then switch to open source models for cost control, reliability, and data privacy.

What It Covers

BaseTen CEO Tuhin Srivastava explains how his AI inference company pivoted from serving data scientists with small models to becoming fastest-growing inference provider for production applications.

Key Questions Answered

•Company pivoting: Stay lean during market shifts - BaseTen remained 18 people from 2019-2023, enabling rapid pivots when ChatGPT and Stable Diffusion created new opportunities without organizational weight.
•Inference differentiation: Focus on dedicated capacity over shared endpoints - 99% of BaseTen's business serves custom models with dedicated infrastructure, avoiding commoditized shared model serving markets.
•Technical optimization: Modern LLM inference requires both infrastructure scaling across thousands of GPUs and runtime optimization using frameworks like VLLM, TensorRT-LLM, and SGLang for performance improvements.
•Market positioning: Open source adoption follows predictable pattern - companies start with Anthropic/OpenAI, then switch to open source models for cost control, reliability, and data privacy.

Notable Moment

Srivastava reveals BaseTen killed three of four products in 2022, including an application builder that consumed two dozen employees for 2.5 years of development work.

Know someone who'd find this useful?

You just read a 3-minute summary of a 56-minute episode.

Get Gradient Dissent summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links.

Tools

TensorRT-LLM
“Modern LLM inference requires both infrastructure scaling across thousands of GPUs and runtime optimization using frameworks like VLLM, TensorRT-LLM, and SGLang for performance improvements.”
SGLang
“Modern LLM inference requires both infrastructure scaling across thousands of GPUs and runtime optimization using frameworks like VLLM, TensorRT-LLM, and SGLang for performance improvements.”
vLLM
“Modern LLM inference requires both infrastructure scaling across thousands of GPUs and runtime optimization using frameworks like VLLM, TensorRT-LLM, and SGLang for performance improvements.”

Similar Episodes

Related episodes from other podcasts

No Priors: Artificial Intelligence | Technology | Startups

May 1

Explore Related Topics

⚡Productivity 🚀Startups 👔Leadership

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Startups & Product Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Gradient Dissent.

Every Monday, we deliver AI summaries of the latest episodes from Gradient Dissent and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

He's Building an AI That Can't Lie | Dan Klein

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

He Raised $70M to Cure Every Disease With AI

The quiet reinvention of a $42b business, with Canva’s Cameron Adams

Books, tools, and gear mentioned in this episode

Tools

More from Gradient Dissent

He's Building an AI That Can't Lie | Dan Klein

He Raised $70M to Cure Every Disease With AI

Uber, Nissan, and Mercedes Chose This Self-Driving Startup | Alex Kendall, Wayve

Why Netflix, Uber, and Spotify Never Lag: The Database Nobody Talks About | Aaron Katz

The $64M Bet on an AI That Has to Be Right | Carina Hong, CEO of Axiom

Similar Episodes

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

The quiet reinvention of a $42b business, with Canva’s Cameron Adams

Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

The hottest running app has nothing to do with speed | E2303

Explore Related Topics

You're clearly into Gradient Dissent.