What are the key takeaways from this Invest Like the Best with Patrick O'Shaughnessy episode?

Key insights include: **Low-Voltage Inference Architecture:** GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.; **Cluster-Scale Memory as Decode Advantage:** The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.; **Prefetch Everything Before Silicon Returns:** Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.

What did Gavin Huberti and Rob Walken discuss on Invest Like the Best with Patrick O'Shaughnessy?

Etched founders Gavin Huberti and Rob Walken explain how they built a transformer-specific AI inference chip on the first tape-out attempt, raising $800M with over $1B in customer demand. They detail their low-voltage inference architecture, cluster-scale memory interconnects, and the vertical integration strategy behind their full rack-scale inference product launched in 2023. Key topics include: **Low-Voltage Inference Architecture:** GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.; **Cluster-Scale Memory as Decode Advantage:** The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation..

How long is this episode of Invest Like the Best with Patrick O'Shaughnessy?

This episode is 87 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Invest Like the Best with Patrick O'Shaughnessy

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

June 30, 2026

87 min episode · 3 min read

Gavin Huberti,Rob Walken

Episode

87 min

Read time

3 min

Topics

Investing, Startups, Fundraising & VC

AI-Generated Summary

Published Jul 1, 2026

Key Takeaways

✓Low-Voltage Inference Architecture: GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.
✓Cluster-Scale Memory as Decode Advantage: The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.
✓Prefetch Everything Before Silicon Returns: Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.
✓Project-Based Legend Recruiting: Map every hard technical problem in the target domain, identify who specifically did the zero-to-one work — not who managed it — then pursue those individuals across 20+ conversations over months. Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue, by identifying him as one of three people globally who fit the exact profile needed, then converting two of the other candidates into investors.
✓Bimodal Talent Philosophy — Legends Plus First-Principles Thinkers: Pair domain legends who know what scaled success looks like with young engineers who have no inherited constraints. Legends prevent billion-dollar mistakes; first-principles thinkers take aggressive risks legends would avoid. Etched pairs figures like Leuler with robotics world-record holders like Sanford, who built a functional cold plate prototype in one week — a task conventional thermal engineers would estimate at months — by simply not knowing it was considered impossible.

What It Covers

Etched founders Gavin Huberti and Rob Walken explain how they built a transformer-specific AI inference chip on the first tape-out attempt, raising $800M with over $1B in customer demand. They detail their low-voltage inference architecture, cluster-scale memory interconnects, and the vertical integration strategy behind their full rack-scale inference product launched in 2023.

Key Questions Answered

•Low-Voltage Inference Architecture: GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.
•Cluster-Scale Memory as Decode Advantage: The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.
•Prefetch Everything Before Silicon Returns: Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.
•Project-Based Legend Recruiting: Map every hard technical problem in the target domain, identify who specifically did the zero-to-one work — not who managed it — then pursue those individuals across 20+ conversations over months. Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue, by identifying him as one of three people globally who fit the exact profile needed, then converting two of the other candidates into investors.
•Bimodal Talent Philosophy — Legends Plus First-Principles Thinkers: Pair domain legends who know what scaled success looks like with young engineers who have no inherited constraints. Legends prevent billion-dollar mistakes; first-principles thinkers take aggressive risks legends would avoid. Etched pairs figures like Leuler with robotics world-record holders like Sanford, who built a functional cold plate prototype in one week — a task conventional thermal engineers would estimate at months — by simply not knowing it was considered impossible.
•Vertical Integration Bounded by Economies of Scale: Integrate vertically only where doing so adds token capacity or removes a binding constraint — not as a default strategy. Etched builds chips, boards, cold plates, interconnects, and production lines in-house because each was a bottleneck. They do not build data centers because customers are already moving power infrastructure to accommodate Etched hardware. The natural integration boundaries sit at chip fabrication on one end and model architecture on the other, with full-stack ownership between.

Notable Moment

Rob Walken described uploading a pre-diagnosis photo of his back tumor — taken before his stage-four bone cancer diagnosis at age 16 — to GPT-4V, which immediately flagged it as a potential tumor requiring urgent MRI. A process that took six months of medical evaluation in 2015 took seconds in 2023, motivating his decision to build inference infrastructure.

Know someone who'd find this useful?

You just read a 3-minute summary of a 84-minute episode.

Get Invest Like the Best with Patrick O'Shaughnessy summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Invest Like the Best with Patrick O'Shaughnessy

Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]

Jun 23 · 69 min

a16z Podcast

Building AI Agents for Enterprise Operations

Jun 1

More from Invest Like the Best with Patrick O'Shaughnessy

Kareem Amin - The Unusual Approach to Company Building - [Invest Like the Best, EP.478]

Jun 16 · 56 min

Capital Allocators

Katelin Holloway – Human Side of Venture Investing at 776 (EP.490)

Mar 9

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

GPT-4V
by OpenAI
“Rob Walken described uploading a pre-diagnosis photo of his back tumor — taken before his stage-four bone cancer diagnosis at age 16 — to GPT-4V, which immediately flagged it as a potential tumor requiring urgent MRI.”

Gear

NVIDIA Blackwell
by NVIDIA
“NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains.”
Amazon
NVIDIA HGX
by NVIDIA
“Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue”
Amazon
NVIDIA DGX
by NVIDIA
“Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue”
Amazon

Similar Episodes

Related episodes from other podcasts

a16z Podcast

Jun 1

Explore Related Topics

📈Investing 🚀Startups 💰Fundraising & VC

This podcast is featured in Best Investing Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Invest Like the Best with Patrick O'Shaughnessy.

Every Monday, we deliver AI summaries of the latest episodes from Invest Like the Best with Patrick O'Shaughnessy and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]

Building AI Agents for Enterprise Operations

Kareem Amin - The Unusual Approach to Company Building - [Invest Like the Best, EP.478]

Katelin Holloway – Human Side of Venture Investing at 776 (EP.490)

Books, tools, and gear mentioned in this episode

Tools

Gear

More from Invest Like the Best with Patrick O'Shaughnessy

Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]

Kareem Amin - The Unusual Approach to Company Building - [Invest Like the Best, EP.478]

Alex Sacerdote - How to Invest Through Technology Cycles - [Invest Like the Best, EP.477]

Dara Khosrowshahi - Uber's Bet on AVs, AI, and Building a Super-App - [Invest Like the Best, EP.476]

Dan Loeb - Lessons from 30 Years of Investing - [Invest Like the Best, EP.475]

Similar Episodes

Building AI Agents for Enterprise Operations

Katelin Holloway – Human Side of Venture Investing at 776 (EP.490)

This Sequoia-backed lab thinks the brain is 'the floor, not the ceiling' for AI

Built on Trust

Build a business that runs without you

Explore Related Topics

You're clearly into Invest Like the Best with Patrick O'Shaughnessy.