What are the key takeaways from this Odd Lots episode?

Key insights include: **Wafer-Scale Memory Architecture:** Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.; **Speed Premium Pricing:** Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.; **Supply Chain Differentiation:** Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.

What did Andrew Feldman discuss on Odd Lots?

Cerebras CEO Andrew Feldman explains how his company built a chip 58 times larger than any competitor, achieving inference speeds 15 times faster than leading GPUs. The episode covers wafer-scale engineering breakthroughs, inference economics, CUDA's declining relevance, open vs. closed source AI models, and semiconductor supply chain constraints. Key topics include: **Wafer-Scale Memory Architecture:** Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.; **Speed Premium Pricing:** Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase..

How long is this episode of Odd Lots?

This episode is 51 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Odd Lots

Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

May 21, 2026

51 min episode · 2 min read

Andrew Feldman

Episode

51 min

Read time

2 min

Topics

Fundraising & VC, Leadership, Artificial Intelligence

AI-Generated Summary

Published May 21, 2026

Key Takeaways

✓Wafer-Scale Memory Architecture: Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.
✓Speed Premium Pricing: Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.
✓Supply Chain Differentiation: Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.
✓CUDA Moat Erosion: CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes. In training, two of three leading frontier models (Gemini on TPUs, Claude on Trainium) now train without CUDA, representing a 70% market share loss for NVIDIA's software ecosystem compared to three years ago.
✓Open vs. Closed Source Economics: Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization. Closed source models outperform open source by roughly 4–5% on quality benchmarks but cost significantly more per token, creating a cost-versus-capability tradeoff enterprises must actively evaluate.

What It Covers

Cerebras CEO Andrew Feldman explains how his company built a chip 58 times larger than any competitor, achieving inference speeds 15 times faster than leading GPUs. The episode covers wafer-scale engineering breakthroughs, inference economics, CUDA's declining relevance, open vs. closed source AI models, and semiconductor supply chain constraints.

Key Questions Answered

•Wafer-Scale Memory Architecture: Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.
•Speed Premium Pricing: Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.
•Supply Chain Differentiation: Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.
•CUDA Moat Erosion: CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes. In training, two of three leading frontier models (Gemini on TPUs, Claude on Trainium) now train without CUDA, representing a 70% market share loss for NVIDIA's software ecosystem compared to three years ago.
•Open vs. Closed Source Economics: Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization. Closed source models outperform open source by roughly 4–5% on quality benchmarks but cost significantly more per token, creating a cost-versus-capability tradeoff enterprises must actively evaluate.

Notable Moment

Feldman reveals that despite solving a 75-year-old unsolvable engineering problem and building the world's fastest inference chip, Cerebras' primary growth constraint today is not manufacturing capacity or software—it is simply the availability of powered data center buildings, a limitation expected to persist for at least 15–18 months.

Know someone who'd find this useful?

You just read a 3-minute summary of a 48-minute episode.

Get Odd Lots summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

One of the World's Largest Hedge Funds on Its 86x Growth in Token Spending

Jul 9 · 52 min

20VC (20 Minute VC)

20VC: Cerebras CEO on the Future of Data Centres, Token Costs and Memory | We are Not in an Infra Bubble & Dario Got a Bad Deal with Elon for Compute | Should US Companies Sell to China & Why Most Layoffs are AI Washed with Andrew Feldman

May 26

These Are the Sharps Actually Making Money on Prediction Markets

Jul 6 · 48 min

No Priors: Artificial Intelligence | Technology | Startups

The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman

May 21

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

Kimi K2
“Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization.”
CUDA
by NVIDIA
“CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes.”

Gear

Cerebras ChipBy guest
by Cerebras
“Cerebras CEO Andrew Feldman explains how his company built a chip 58 times larger than any competitor, achieving inference speeds 15 times faster than leading GPUs.”
Amazon

Products

Anthropic Faster Inference Tier
by Anthropic
“Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed.”
Amazon

Similar Episodes

Related episodes from other podcasts

20VC (20 Minute VC)

May 26

20VC: Cerebras CEO on the Future of Data Centres, Token Costs and Memory | We are Not in an Infra Bubble & Dario Got a Bad Deal with Elon for Compute | Should US Companies Sell to China & Why Most Layoffs are AI Washed with Andrew Feldman

No Priors: Artificial Intelligence | Technology | Startups

May 21

20VC: Cerebras CEO on Why Raise $1BN and Delay the IPO | NVIDIA Showing Signs They Are Worried About Growth | Concentration of Value in Mag7: Will the AI Train Come to a Halt | Can the US Supply the Energy for AI with Andrew Feldman

Explore Related Topics

💰Fundraising & VC 👔Leadership 🤖Artificial Intelligence

This podcast is featured in Best Finance Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Odd Lots.

Every Monday, we deliver AI summaries of the latest episodes from Odd Lots and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

One of the World's Largest Hedge Funds on Its 86x Growth in Token Spending

20VC: Cerebras CEO on the Future of Data Centres, Token Costs and Memory | We are Not in an Infra Bubble & Dario Got a Bad Deal with Elon for Compute | Should US Companies Sell to China & Why Most Layoffs are AI Washed with Andrew Feldman

These Are the Sharps Actually Making Money on Prediction Markets

The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman

Books, tools, and gear mentioned in this episode

Tools

Gear

Products

More from Odd Lots

One of the World's Largest Hedge Funds on Its 86x Growth in Token Spending

These Are the Sharps Actually Making Money on Prediction Markets

How a Major Grocery Store Chain Can Dramatically Lower the Cost of Food

What Dan Wang Saw on His Last Trip to China

Baidu's CFO on How It Became a Full-Stack AI Player

Similar Episodes

20VC: Cerebras CEO on the Future of Data Centres, Token Costs and Memory | We are Not in an Infra Bubble & Dario Got a Bad Deal with Elon for Compute | Should US Companies Sell to China & Why Most Layoffs are AI Washed with Andrew Feldman

The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman

Giving Agents Computers — Ivan Burazin, Daytona

Coinbase CEO Brian Armstrong Breaks Down the Three Biggest Trends in Crypto + More from Davos!

20VC: Cerebras CEO on Why Raise $1BN and Delay the IPO | NVIDIA Showing Signs They Are Worried About Growth | Concentration of Value in Mag7: Will the AI Train Come to a Halt | Can the US Supply the Energy for AI with Andrew Feldman

Explore Related Topics

You're clearly into Odd Lots.