Skip to main content
NI

Nvidia's Ian Buck

1episode
1podcast

We have 1 summarized appearance for Nvidia's Ian Buck so far. Browse all podcasts to discover more episodes.

Featured On 1 Podcast

All Appearances

1 episode

AI Summary

→ WHAT IT COVERS Ian Buck explains how Mixture of Experts architecture powers leading AI models by activating only 3-10% of neural network parameters per query, reducing token costs by 10x while increasing intelligence scores from 28 to 61. → KEY INSIGHTS - **MOE Cost Reduction:** DeepSeek's GPT-OSS model uses 120 billion total parameters but activates only 5 billion per query versus Llama's 405 billion fully active parameters, reducing benchmark costs from $200 to $75 while doubling intelligence scores through selective expert activation. - **NVLink Communication Architecture:** GB200 NVL72 connects 72 GPUs with non-blocking terabytes-per-second bandwidth using copper wires at 200 gigabits per second, enabling 15x performance improvement over 8-GPU Hopper systems while adding only 50% cost, achieving 10x token cost reduction to 10 cents per million tokens. - **Expert Parallelization Strategy:** Modern MOE models deploy 300-400 experts across multiple layers with router networks directing queries to 2-8 relevant experts simultaneously, combining responses without prescriptive knowledge domains—AI training naturally clusters information into specialized pockets through data exposure patterns rather than manual categorization. - **Extreme Co-Design Process:** NVIDIA software engineers outnumber hardware engineers to optimize end-to-end performance through kernel fusions and NVLink communication overlaps, recently achieving 2x performance gains on customer models within two weeks, directly halving token costs without hardware changes through software optimization alone. → NOTABLE MOMENT Buck reveals that MOE models operate using PAM four signaling that transmits four bits per wire instead of binary zero-one, pushing physics limits with millimeter wavelengths to enable trillion-parameter models like QwenMax-2 that activate only 32 billion parameters per query. 💼 SPONSORS None detected 🏷️ Mixture of Experts, AI Infrastructure, Token Economics, NVLink Architecture

Never miss Nvidia's Ian Buck's insights

Subscribe to get AI-powered summaries of Nvidia's Ian Buck's podcast appearances delivered to your inbox weekly.

Start Free Today

No credit card required • Free tier available