Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]
Episode
87 min
Read time
3 min
Topics
Investing, Startups, Fundraising & VC
AI-Generated Summary
Key Takeaways
- ✓Low-Voltage Inference Architecture: GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.
- ✓Cluster-Scale Memory as Decode Advantage: The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.
- ✓Prefetch Everything Before Silicon Returns: Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.
- ✓Project-Based Legend Recruiting: Map every hard technical problem in the target domain, identify who specifically did the zero-to-one work — not who managed it — then pursue those individuals across 20+ conversations over months. Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue, by identifying him as one of three people globally who fit the exact profile needed, then converting two of the other candidates into investors.
- ✓Bimodal Talent Philosophy — Legends Plus First-Principles Thinkers: Pair domain legends who know what scaled success looks like with young engineers who have no inherited constraints. Legends prevent billion-dollar mistakes; first-principles thinkers take aggressive risks legends would avoid. Etched pairs figures like Leuler with robotics world-record holders like Sanford, who built a functional cold plate prototype in one week — a task conventional thermal engineers would estimate at months — by simply not knowing it was considered impossible.
What It Covers
Etched founders Gavin Huberti and Rob Walken explain how they built a transformer-specific AI inference chip on the first tape-out attempt, raising $800M with over $1B in customer demand. They detail their low-voltage inference architecture, cluster-scale memory interconnects, and the vertical integration strategy behind their full rack-scale inference product launched in 2023.
Key Questions Answered
- •Low-Voltage Inference Architecture: GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.
- •Cluster-Scale Memory as Decode Advantage: The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.
- •Prefetch Everything Before Silicon Returns: Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.
- •Project-Based Legend Recruiting: Map every hard technical problem in the target domain, identify who specifically did the zero-to-one work — not who managed it — then pursue those individuals across 20+ conversations over months. Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue, by identifying him as one of three people globally who fit the exact profile needed, then converting two of the other candidates into investors.
- •Bimodal Talent Philosophy — Legends Plus First-Principles Thinkers: Pair domain legends who know what scaled success looks like with young engineers who have no inherited constraints. Legends prevent billion-dollar mistakes; first-principles thinkers take aggressive risks legends would avoid. Etched pairs figures like Leuler with robotics world-record holders like Sanford, who built a functional cold plate prototype in one week — a task conventional thermal engineers would estimate at months — by simply not knowing it was considered impossible.
- •Vertical Integration Bounded by Economies of Scale: Integrate vertically only where doing so adds token capacity or removes a binding constraint — not as a default strategy. Etched builds chips, boards, cold plates, interconnects, and production lines in-house because each was a bottleneck. They do not build data centers because customers are already moving power infrastructure to accommodate Etched hardware. The natural integration boundaries sit at chip fabrication on one end and model architecture on the other, with full-stack ownership between.
Notable Moment
Rob Walken described uploading a pre-diagnosis photo of his back tumor — taken before his stage-four bone cancer diagnosis at age 16 — to GPT-4V, which immediately flagged it as a potential tumor requiring urgent MRI. A process that took six months of medical evaluation in 2015 took seconds in 2023, motivating his decision to build inference infrastructure.
You just read a 3-minute summary of a 84-minute episode.
Get Invest Like the Best with Patrick O'Shaughnessy summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Invest Like the Best with Patrick O'Shaughnessy
Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]
Jun 23 · 69 min
a16z Podcast
Building AI Agents for Enterprise Operations
Jun 1
More from Invest Like the Best with Patrick O'Shaughnessy
Kareem Amin - The Unusual Approach to Company Building - [Invest Like the Best, EP.478]
Jun 16 · 56 min
Capital Allocators
Katelin Holloway – Human Side of Venture Investing at 776 (EP.490)
Mar 9
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
by OpenAI
“Rob Walken described uploading a pre-diagnosis photo of his back tumor — taken before his stage-four bone cancer diagnosis at age 16 — to GPT-4V, which immediately flagged it as a potential tumor requiring urgent MRI.”
Gear
by NVIDIA
“NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains.”
by NVIDIA
“Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue”
by NVIDIA
“Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue”
More from Invest Like the Best with Patrick O'Shaughnessy
We summarize every new episode. Want them in your inbox?
Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]
Kareem Amin - The Unusual Approach to Company Building - [Invest Like the Best, EP.478]
Alex Sacerdote - How to Invest Through Technology Cycles - [Invest Like the Best, EP.477]
Dara Khosrowshahi - Uber's Bet on AVs, AI, and Building a Super-App - [Invest Like the Best, EP.476]
Dan Loeb - Lessons from 30 Years of Investing - [Invest Like the Best, EP.475]
Similar Episodes
Related episodes from other podcasts
a16z Podcast
Jun 1
Building AI Agents for Enterprise Operations
Capital Allocators
Mar 9
Katelin Holloway – Human Side of Venture Investing at 776 (EP.490)
Equity
Feb 10
This Sequoia-backed lab thinks the brain is 'the floor, not the ceiling' for AI
The Rework Podcast
Oct 8
Built on Trust
The Rework Podcast
Mar 26
Build a business that runs without you
Explore Related Topics
This podcast is featured in Best Investing Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Invest Like the Best with Patrick O'Shaughnessy.
Every Monday, we deliver AI summaries of the latest episodes from Invest Like the Best with Patrick O'Shaughnessy and 192+ other podcasts. Free for one show.
Start My Monday DigestNo credit card · Unsubscribe anytime