#324 Sharon Zhou: Inside AMD's Plan to Build Self-Improving AI
Episode
46 min
Read time
2 min
Topics
Productivity, Fundraising & VC, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Catastrophic Forgetting Prevention: When fine-tuning models without access to original pre-training data, reintroducing as little as 1% of pre-training data during post-training significantly reduces catastrophic forgetting. This allows models to reconnect with earlier representations. Developers doing heavy post-training workloads should monitor this actively, as even small fine-tuning tasks can compound into larger degradation over time.
- ✓Kernel Optimization Economics: Speeding up a single matrix multiplication kernel — which executes billions or trillions of times inside one model — can translate to hundreds of billions of dollars in savings at frontier scale. Even for smaller deployments, a 10x kernel speed improvement is economically equivalent to purchasing 10x more GPU compute hardware.
- ✓Verifiable Rewards for RL Training: AMD's kernel generation pipeline uses GPU profiler output as a verifiable reward signal for reinforcement learning, similar to how math correctness verified ChatGPT's reasoning training. Because profiler speed metrics are objective and non-subjective, they feed directly back into post-training loops without requiring costly human preference labeling.
- ✓AMD's Open-Source Advantage: AMD's ROCm software stack is open-source, unlike NVIDIA's CUDA. This means language models can train directly on ROCm documentation and code, enabling AI agents to learn AMD-specific kernel writing more effectively. Developers building on AMD hardware benefit from this transparency when using AI coding tools like Cursor to generate optimized kernels.
- ✓Kernel Engineering Skill Gap: Writing optimized GPU kernels requires simultaneous expertise in GPU architecture specifics and model mathematics — a combination rare enough to bottleneck even frontier labs. A complex kernel can take a non-expert months and an expert several weeks to write manually, making AI-assisted kernel generation a high-leverage productivity multiplier across the entire AI development stack.
What It Covers
Sharon Zhou, VP of AI at AMD and Stanford PhD graduate, explains how AMD uses AI agents and reinforcement learning to autonomously generate and optimize low-level GPU kernel code, enabling language models to run faster on AMD hardware while reducing the rare human expertise bottleneck in kernel engineering.
Key Questions Answered
- •Catastrophic Forgetting Prevention: When fine-tuning models without access to original pre-training data, reintroducing as little as 1% of pre-training data during post-training significantly reduces catastrophic forgetting. This allows models to reconnect with earlier representations. Developers doing heavy post-training workloads should monitor this actively, as even small fine-tuning tasks can compound into larger degradation over time.
- •Kernel Optimization Economics: Speeding up a single matrix multiplication kernel — which executes billions or trillions of times inside one model — can translate to hundreds of billions of dollars in savings at frontier scale. Even for smaller deployments, a 10x kernel speed improvement is economically equivalent to purchasing 10x more GPU compute hardware.
- •Verifiable Rewards for RL Training: AMD's kernel generation pipeline uses GPU profiler output as a verifiable reward signal for reinforcement learning, similar to how math correctness verified ChatGPT's reasoning training. Because profiler speed metrics are objective and non-subjective, they feed directly back into post-training loops without requiring costly human preference labeling.
- •AMD's Open-Source Advantage: AMD's ROCm software stack is open-source, unlike NVIDIA's CUDA. This means language models can train directly on ROCm documentation and code, enabling AI agents to learn AMD-specific kernel writing more effectively. Developers building on AMD hardware benefit from this transparency when using AI coding tools like Cursor to generate optimized kernels.
- •Kernel Engineering Skill Gap: Writing optimized GPU kernels requires simultaneous expertise in GPU architecture specifics and model mathematics — a combination rare enough to bottleneck even frontier labs. A complex kernel can take a non-expert months and an expert several weeks to write manually, making AI-assisted kernel generation a high-leverage productivity multiplier across the entire AI development stack.
Notable Moment
When asked whether faster kernel optimization might relieve the global chip shortage pressure, Zhou flatly dismissed the idea — stating that demand for compute is effectively infinite and no organization has reached a point where efficiency gains reduce their appetite for more hardware.
You just read a 3-minute summary of a 43-minute episode.
Get Eye on AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Eye on AI
Every Enterprise Is About to Have a 100,000 Agent Problem | Oren Michaels of Barndoor AI
Jun 6 · 59 min
Beyond Biotech
How Epic Bio is leveraging CRISPR without cutting DNA
Apr 30
More from Eye on AI
More Customers Chose the AI Agent Than Anyone Expected | Tom Chen, Aircall
Jun 4 · 56 min
Practical AI
Open Source Self-Driving with Comma AI
Apr 16
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
by AMD
“AMD's ROCm software stack is open-source, unlike NVIDIA's CUDA. This means language models can train directly on ROCm documentation and code, enabling AI agents to learn AMD-specific kernel writing more effectively.”
“Developers building on AMD hardware benefit from this transparency when using AI coding tools like Cursor to generate optimized kernels.”
More from Eye on AI
We summarize every new episode. Want them in your inbox?
Every Enterprise Is About to Have a 100,000 Agent Problem | Oren Michaels of Barndoor AI
More Customers Chose the AI Agent Than Anyone Expected | Tom Chen, Aircall
Why the Future of AI Isn't Just Bigger Models. It's Models That Evolve | Risto Miikkulainen of Cognizant
How AI Is Reinventing Elder Care | Chia-Lin Simmons of LogicMark
The App of the Future Is Voice — Not a Screen. Mitel's CTO Luiz Domingos Explains Why.
Similar Episodes
Related episodes from other podcasts
Beyond Biotech
Apr 30
How Epic Bio is leveraging CRISPR without cutting DNA
Practical AI
Apr 16
Open Source Self-Driving with Comma AI
Huberman Lab
Nov 17
How to Speak Clearly & With Confidence | Matt Abrahams
No Priors: Artificial Intelligence | Technology | Startups
Oct 9
Humans&: Bridging IQ and EQ in Machine Learning with Eric Zelikman
NVIDIA AI Podcast
Aug 27
Amperity Reimagines Data and Developer Workflows with AI - Ep. 271
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Eye on AI.
Every Monday, we deliver AI summaries of the latest episodes from Eye on AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime