Skip to main content
Invest Like the Best with Patrick O'Shaughnessy

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

87 min episode · 3 min read
·
Gavin Huberti,Rob Walken

Episode

87 min

Read time

3 min

Topics

Investing, Startups, Fundraising & VC

AI-Generated Summary

Key Takeaways

  • Low-Voltage Inference Architecture: GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.
  • Cluster-Scale Memory as Decode Advantage: The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.
  • Prefetch Everything Before Silicon Returns: Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.
  • Project-Based Legend Recruiting: Map every hard technical problem in the target domain, identify who specifically did the zero-to-one work — not who managed it — then pursue those individuals across 20+ conversations over months. Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue, by identifying him as one of three people globally who fit the exact profile needed, then converting two of the other candidates into investors.
  • Bimodal Talent Philosophy — Legends Plus First-Principles Thinkers: Pair domain legends who know what scaled success looks like with young engineers who have no inherited constraints. Legends prevent billion-dollar mistakes; first-principles thinkers take aggressive risks legends would avoid. Etched pairs figures like Leuler with robotics world-record holders like Sanford, who built a functional cold plate prototype in one week — a task conventional thermal engineers would estimate at months — by simply not knowing it was considered impossible.

What It Covers

Etched founders Gavin Huberti and Rob Walken explain how they built a transformer-specific AI inference chip on the first tape-out attempt, raising $800M with over $1B in customer demand. They detail their low-voltage inference architecture, cluster-scale memory interconnects, and the vertical integration strategy behind their full rack-scale inference product launched in 2023.

Key Questions Answered

  • Low-Voltage Inference Architecture: GPUs thermal-throttle because voltage scales quadratically with power — doubling voltage quadruples power draw. Etched runs at under half the voltage of any competing AI chip by redesigning power delivery planes entirely. This unlocks significantly higher flop density without thermal throttling, enabling more compute per watt. Bitcoin miners already proved sub-quarter-voltage operation was physically possible; the question was whether transformer workloads could be restructured to support it.
  • Cluster-Scale Memory as Decode Advantage: The correct metric for decode performance is not single-chip memory bandwidth but full cluster memory bandwidth. NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains. Etched built a fully custom interconnect stack above Layer 2, cutting latency by more than 5x, enabling the entire cluster's SRAM and HBM to function as a unified memory pool for token generation.
  • Prefetch Everything Before Silicon Returns: Etched compressed post-silicon bring-up from the industry benchmark of 10 months down to 40 days by completing all parallel work before chips arrived. This included deploying 700 FPGAs running full inference stacks, shipping racks to customer data centers pre-chip for software validation, building thermal mock chips to validate cold plates, and standing up full production lines. Every workstream that did not require physical silicon was finished in advance.
  • Project-Based Legend Recruiting: Map every hard technical problem in the target domain, identify who specifically did the zero-to-one work — not who managed it — then pursue those individuals across 20+ conversations over months. Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue, by identifying him as one of three people globally who fit the exact profile needed, then converting two of the other candidates into investors.
  • Bimodal Talent Philosophy — Legends Plus First-Principles Thinkers: Pair domain legends who know what scaled success looks like with young engineers who have no inherited constraints. Legends prevent billion-dollar mistakes; first-principles thinkers take aggressive risks legends would avoid. Etched pairs figures like Leuler with robotics world-record holders like Sanford, who built a functional cold plate prototype in one week — a task conventional thermal engineers would estimate at months — by simply not knowing it was considered impossible.
  • Vertical Integration Bounded by Economies of Scale: Integrate vertically only where doing so adds token capacity or removes a binding constraint — not as a default strategy. Etched builds chips, boards, cold plates, interconnects, and production lines in-house because each was a bottleneck. They do not build data centers because customers are already moving power infrastructure to accommodate Etched hardware. The natural integration boundaries sit at chip fabrication on one end and model architecture on the other, with full-stack ownership between.

Notable Moment

Rob Walken described uploading a pre-diagnosis photo of his back tumor — taken before his stage-four bone cancer diagnosis at age 16 — to GPT-4V, which immediately flagged it as a potential tumor requiring urgent MRI. A process that took six months of medical evaluation in 2015 took seconds in 2023, motivating his decision to build inference infrastructure.

Know someone who'd find this useful?

You just read a 3-minute summary of a 84-minute episode.

Get Invest Like the Best with Patrick O'Shaughnessy summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

  • by OpenAI

    Rob Walken described uploading a pre-diagnosis photo of his back tumor — taken before his stage-four bone cancer diagnosis at age 16 — to GPT-4V, which immediately flagged it as a potential tumor requiring urgent MRI.

Gear

  • by NVIDIA

    NVIDIA Blackwell chip-to-chip latency runs approximately 4,000 nanoseconds point-to-point, meaning 8-chip tensor-parallel setups deliver far less than 8x throughput gains.
  • by NVIDIA

    Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue
  • by NVIDIA

    Etched recruited Brian Leuler, who built NVIDIA's HGX and DGX rack systems representing the majority of NVIDIA's revenue

More from Invest Like the Best with Patrick O'Shaughnessy

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best Investing Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Invest Like the Best with Patrick O'Shaughnessy.

Every Monday, we deliver AI summaries of the latest episodes from Invest Like the Best with Patrick O'Shaughnessy and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime