#324 Sharon Zhou: Inside AMD's Plan to Build Self-Improving AI
Episode
46 min
Read time
2 min
Topics
Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Catastrophic Forgetting Prevention: When fine-tuning models without access to original pre-training data, reintroducing as little as 1% of pre-training data during post-training significantly reduces catastrophic forgetting. This allows models to reconnect with earlier representations. Developers doing heavy post-training workloads should monitor this actively, as even small fine-tuning tasks can compound into larger degradation over time.
- ✓Kernel Optimization Economics: Speeding up a single matrix multiplication kernel — which executes billions or trillions of times inside one model — can translate to hundreds of billions of dollars in savings at frontier scale. Even for smaller deployments, a 10x kernel speed improvement is economically equivalent to purchasing 10x more GPU compute hardware.
- ✓Verifiable Rewards for RL Training: AMD's kernel generation pipeline uses GPU profiler output as a verifiable reward signal for reinforcement learning, similar to how math correctness verified ChatGPT's reasoning training. Because profiler speed metrics are objective and non-subjective, they feed directly back into post-training loops without requiring costly human preference labeling.
- ✓AMD's Open-Source Advantage: AMD's ROCm software stack is open-source, unlike NVIDIA's CUDA. This means language models can train directly on ROCm documentation and code, enabling AI agents to learn AMD-specific kernel writing more effectively. Developers building on AMD hardware benefit from this transparency when using AI coding tools like Cursor to generate optimized kernels.
- ✓Kernel Engineering Skill Gap: Writing optimized GPU kernels requires simultaneous expertise in GPU architecture specifics and model mathematics — a combination rare enough to bottleneck even frontier labs. A complex kernel can take a non-expert months and an expert several weeks to write manually, making AI-assisted kernel generation a high-leverage productivity multiplier across the entire AI development stack.
What It Covers
Sharon Zhou, VP of AI at AMD and Stanford PhD graduate, explains how AMD uses AI agents and reinforcement learning to autonomously generate and optimize low-level GPU kernel code, enabling language models to run faster on AMD hardware while reducing the rare human expertise bottleneck in kernel engineering.
Key Questions Answered
- •Catastrophic Forgetting Prevention: When fine-tuning models without access to original pre-training data, reintroducing as little as 1% of pre-training data during post-training significantly reduces catastrophic forgetting. This allows models to reconnect with earlier representations. Developers doing heavy post-training workloads should monitor this actively, as even small fine-tuning tasks can compound into larger degradation over time.
- •Kernel Optimization Economics: Speeding up a single matrix multiplication kernel — which executes billions or trillions of times inside one model — can translate to hundreds of billions of dollars in savings at frontier scale. Even for smaller deployments, a 10x kernel speed improvement is economically equivalent to purchasing 10x more GPU compute hardware.
- •Verifiable Rewards for RL Training: AMD's kernel generation pipeline uses GPU profiler output as a verifiable reward signal for reinforcement learning, similar to how math correctness verified ChatGPT's reasoning training. Because profiler speed metrics are objective and non-subjective, they feed directly back into post-training loops without requiring costly human preference labeling.
- •AMD's Open-Source Advantage: AMD's ROCm software stack is open-source, unlike NVIDIA's CUDA. This means language models can train directly on ROCm documentation and code, enabling AI agents to learn AMD-specific kernel writing more effectively. Developers building on AMD hardware benefit from this transparency when using AI coding tools like Cursor to generate optimized kernels.
- •Kernel Engineering Skill Gap: Writing optimized GPU kernels requires simultaneous expertise in GPU architecture specifics and model mathematics — a combination rare enough to bottleneck even frontier labs. A complex kernel can take a non-expert months and an expert several weeks to write manually, making AI-assisted kernel generation a high-leverage productivity multiplier across the entire AI development stack.
Notable Moment
When asked whether faster kernel optimization might relieve the global chip shortage pressure, Zhou flatly dismissed the idea — stating that demand for compute is effectively infinite and no organization has reached a point where efficiency gains reduce their appetite for more hardware.
You just read a 3-minute summary of a 43-minute episode.
Get Eye on AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Eye on AI
#338 Amith Singhee: Can India Catch Up in AI? IBM's Amith Singhee on What It Will Take
Apr 24 · 46 min
Odd Lots
Presenting Foundering Season 6: The Killing of Bob Lee, Part 1
Apr 26
More from Eye on AI
#337 Debdas Sen: Why AI Without ROI Will Die (Again)
Apr 23 · 51 min
Masters of Scale
Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers
Apr 25
More from Eye on AI
We summarize every new episode. Want them in your inbox?
#338 Amith Singhee: Can India Catch Up in AI? IBM's Amith Singhee on What It Will Take
#337 Debdas Sen: Why AI Without ROI Will Die (Again)
#336 Professor Mausam: Why India Is Losing the AI Race and What It Will Take to Catch Up
#335 Sriram Raghavan: Why IBM Is Betting Everything on Small AI Models
#334 Abhishek Singh: The $1.2 Billion Plan to Turn India Into an AI Superpower
Similar Episodes
Related episodes from other podcasts
Odd Lots
Apr 26
Presenting Foundering Season 6: The Killing of Bob Lee, Part 1
Masters of Scale
Apr 25
Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers
The Futur
Apr 25
Why Process is Better Than AI w/ Scott Clum | Ep 430
20VC (20 Minute VC)
Apr 25
20Product: Replit CEO on Why Coding Models Are Plateauing | Why the SaaS Apocalypse is Justified: Will Incumbents Be Replaced? | Why IDEs Are Dead and Do PMs Survive the Next 3-5 Years with Amjad Masad
This Week in Startups
Apr 25
The Defense Tech Startup YC Kicked Out of a Meeting is Now Arming America | E2280
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Eye on AI.
Every Monday, we deliver AI summaries of the latest episodes from Eye on AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime