Sharon Zhou

#324 Sharon Zhou: Inside AMD's Plan to Build Self-Improving AI

Feb 27, 202646 minVP of AI at AMD

AI Summary

→ WHAT IT COVERS Sharon Zhou, VP of AI at AMD and Stanford PhD graduate, explains how AMD uses AI agents and reinforcement learning to autonomously generate and optimize low-level GPU kernel code, enabling language models to run faster on AMD hardware while reducing the rare human expertise bottleneck in kernel engineering. → KEY INSIGHTS - **Catastrophic Forgetting Prevention:** When fine-tuning models without access to original pre-training data, reintroducing as little as 1% of pre-training data during post-training significantly reduces catastrophic forgetting. This allows models to reconnect with earlier representations. Developers doing heavy post-training workloads should monitor this actively, as even small fine-tuning tasks can compound into larger degradation over time. - **Kernel Optimization Economics:** Speeding up a single matrix multiplication kernel — which executes billions or trillions of times inside one model — can translate to hundreds of billions of dollars in savings at frontier scale. Even for smaller deployments, a 10x kernel speed improvement is economically equivalent to purchasing 10x more GPU compute hardware. - **Verifiable Rewards for RL Training:** AMD's kernel generation pipeline uses GPU profiler output as a verifiable reward signal for reinforcement learning, similar to how math correctness verified ChatGPT's reasoning training. Because profiler speed metrics are objective and non-subjective, they feed directly back into post-training loops without requiring costly human preference labeling. - **AMD's Open-Source Advantage:** AMD's ROCm software stack is open-source, unlike NVIDIA's CUDA. This means language models can train directly on ROCm documentation and code, enabling AI agents to learn AMD-specific kernel writing more effectively. Developers building on AMD hardware benefit from this transparency when using AI coding tools like Cursor to generate optimized kernels. - **Kernel Engineering Skill Gap:** Writing optimized GPU kernels requires simultaneous expertise in GPU architecture specifics and model mathematics — a combination rare enough to bottleneck even frontier labs. A complex kernel can take a non-expert months and an expert several weeks to write manually, making AI-assisted kernel generation a high-leverage productivity multiplier across the entire AI development stack. → NOTABLE MOMENT When asked whether faster kernel optimization might relieve the global chip shortage pressure, Zhou flatly dismissed the idea — stating that demand for compute is effectively infinite and no organization has reached a point where efficiency gains reduce their appetite for more hardware. 💼 SPONSORS None detected 🏷️ GPU Kernel Optimization, Self-Improving AI, Reinforcement Learning, AMD ROCm, Post-Training Techniques

Read Full Summary Listen

Featured On 1 Podcast

Eye on AI

All Appearances

#324 Sharon Zhou: Inside AMD's Plan to Build Self-Improving AI

AI Summary

Explore More

Never miss Sharon Zhou's insights