Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip
Episode
51 min
Read time
2 min
Topics
Leadership
AI-Generated Summary
Key Takeaways
- ✓Wafer-Scale Memory Architecture: Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.
- ✓Speed Premium Pricing: Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.
- ✓Supply Chain Differentiation: Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.
- ✓CUDA Moat Erosion: CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes. In training, two of three leading frontier models (Gemini on TPUs, Claude on Trainium) now train without CUDA, representing a 70% market share loss for NVIDIA's software ecosystem compared to three years ago.
- ✓Open vs. Closed Source Economics: Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization. Closed source models outperform open source by roughly 4–5% on quality benchmarks but cost significantly more per token, creating a cost-versus-capability tradeoff enterprises must actively evaluate.
What It Covers
Cerebras CEO Andrew Feldman explains how his company built a chip 58 times larger than any competitor, achieving inference speeds 15 times faster than leading GPUs. The episode covers wafer-scale engineering breakthroughs, inference economics, CUDA's declining relevance, open vs. closed source AI models, and semiconductor supply chain constraints.
Key Questions Answered
- •Wafer-Scale Memory Architecture: Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.
- •Speed Premium Pricing: Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.
- •Supply Chain Differentiation: Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.
- •CUDA Moat Erosion: CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes. In training, two of three leading frontier models (Gemini on TPUs, Claude on Trainium) now train without CUDA, representing a 70% market share loss for NVIDIA's software ecosystem compared to three years ago.
- •Open vs. Closed Source Economics: Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization. Closed source models outperform open source by roughly 4–5% on quality benchmarks but cost significantly more per token, creating a cost-versus-capability tradeoff enterprises must actively evaluate.
Notable Moment
Feldman reveals that despite solving a 75-year-old unsolvable engineering problem and building the world's fastest inference chip, Cerebras' primary growth constraint today is not manufacturing capacity or software—it is simply the availability of powered data center buildings, a limitation expected to persist for at least 15–18 months.
You just read a 3-minute summary of a 48-minute episode.
Get Odd Lots summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Odd Lots
What It Takes to Run One of London's Most Popular Pubs
May 25 · 68 min
Animal Spirits
Talk Your Book: Investing in the Rise of the Robots
May 25
More from Odd Lots
Architect Norman Foster on Why the West Struggles to Build Big
May 23 · 54 min
Capital Allocators
Fundraising Mastery: The Tao of Kimmer – John Kim (EP.503)
May 25
More from Odd Lots
We summarize every new episode. Want them in your inbox?
What It Takes to Run One of London's Most Popular Pubs
Architect Norman Foster on Why the West Struggles to Build Big
'The Assassin' Fahmi Quadir on How to Survive as a Short-Seller
Deutsche Bank's Ozan Tarman and Aditya Singhal on Understanding the Macro Risks
Why the Price of Oil, Beef, Electricity, and Everything Else Makes No Sense
Similar Episodes
Related episodes from other podcasts
Animal Spirits
May 25
Talk Your Book: Investing in the Rise of the Robots
Capital Allocators
May 25
Fundraising Mastery: The Tao of Kimmer – John Kim (EP.503)
The Productivity Show
May 25
The Productivity Stack: Apps and Tools We Actually Use Every Day (TPS614)
The Diary of a CEO
May 25
Bruno Fernandes: Roy Keane Twisted My Words. They Offered Me £200M, I Said No.
The Model Health Show
May 25
66% of Chronic Back Pain CURED: The Groundbreaking Study Changing Medicine – With Dr. Howard Schubiner
Explore Related Topics
This podcast is featured in Best Finance Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into Odd Lots.
Every Monday, we deliver AI summaries of the latest episodes from Odd Lots and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime