Skip to main content
Odd Lots

Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

51 min episode · 2 min read
·

Episode

51 min

Read time

2 min

Topics

Leadership

AI-Generated Summary

Key Takeaways

  • Wafer-Scale Memory Architecture: Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.
  • Speed Premium Pricing: Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.
  • Supply Chain Differentiation: Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.
  • CUDA Moat Erosion: CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes. In training, two of three leading frontier models (Gemini on TPUs, Claude on Trainium) now train without CUDA, representing a 70% market share loss for NVIDIA's software ecosystem compared to three years ago.
  • Open vs. Closed Source Economics: Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization. Closed source models outperform open source by roughly 4–5% on quality benchmarks but cost significantly more per token, creating a cost-versus-capability tradeoff enterprises must actively evaluate.

What It Covers

Cerebras CEO Andrew Feldman explains how his company built a chip 58 times larger than any competitor, achieving inference speeds 15 times faster than leading GPUs. The episode covers wafer-scale engineering breakthroughs, inference economics, CUDA's declining relevance, open vs. closed source AI models, and semiconductor supply chain constraints.

Key Questions Answered

  • Wafer-Scale Memory Architecture: Cerebras achieves 15x faster inference than GPUs—and up to 1,000x faster on specific workloads—by using fast SRAM instead of slow HBM memory. The tradeoff is lower storage density per square millimeter, solved by building a chip covering an entire silicon wafer, roughly dinner-plate sized, stuffed with high-speed memory.
  • Speed Premium Pricing: Anthropic's 2x-faster inference tier sold out at 6x the standard price, demonstrating that enterprise buyers pay significant premiums for speed. Cerebras operates at 15x faster than that tier, suggesting substantial pricing power. Slow tokens cost less to produce on GPUs, but GPU cost-per-token rises sharply as speed requirements increase.
  • Supply Chain Differentiation: Cerebras avoids three major AI chip bottlenecks simultaneously: HBM memory shortages, TSMC's constrained CoWoS packaging process, and TSMC's oversubscribed 3nm node. By using 5nm fabrication and on-chip SRAM, Cerebras sidesteps constraints choking NVIDIA and other GPU vendors, leaving data center availability as the primary growth limiter.
  • CUDA Moat Erosion: CUDA has zero role in inference workloads—migrating a model from GPU to Cerebras requires roughly 10 configuration changes. In training, two of three leading frontier models (Gemini on TPUs, Claude on Trainium) now train without CUDA, representing a 70% market share loss for NVIDIA's software ecosystem compared to three years ago.
  • Open vs. Closed Source Economics: Open source models like Kimi K2 (1 trillion parameters) run on Cerebras today at a cost reflecting only compute and power—not training amortization. Closed source models outperform open source by roughly 4–5% on quality benchmarks but cost significantly more per token, creating a cost-versus-capability tradeoff enterprises must actively evaluate.

Notable Moment

Feldman reveals that despite solving a 75-year-old unsolvable engineering problem and building the world's fastest inference chip, Cerebras' primary growth constraint today is not manufacturing capacity or software—it is simply the availability of powered data center buildings, a limitation expected to persist for at least 15–18 months.

Know someone who'd find this useful?

You just read a 3-minute summary of a 48-minute episode.

Get Odd Lots summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Odd Lots

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best Finance Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Odd Lots.

Every Monday, we deliver AI summaries of the latest episodes from Odd Lots and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime