This week, This Week in Cognitive Revolution explored the practical mechanics of reinforcement learning fine-tuning with CoreWeave's Kyle Corbitt, who broke down the GRPO framework, rubric design, and environment setup that teams use to steer model behavior at scale. The episode surfaced a central tension in modern AI development: the challenge of defining what we want models to optimize for without accidentally incentivizing them to game the metrics we've created. Corbitt's discussion of reward hacking and its mitigation offers a window into how practitioners are thinking about alignment problems in real production systems.
Episodes This Week
Get a free sample digest — no signup needed
Real AI summaries from top cognitive revolution podcasts, straight to your inbox.
No spam, unsubscribe anytime. We'll send one sample digest, then you decide.
Other Weeks
This Week in Cognitive Revolution (Jun 8 - Jun 14, 2026)
2 episodes · Jun 8 – Jun 14
This Week in Cognitive Revolution (Jun 1 - Jun 7, 2026)
2 episodes · Jun 1 – Jun 7
This Week in Cognitive Revolution (May 25 - May 31, 2026)
2 episodes · May 25 – May 31
This Week in Cognitive Revolution (May 18 - May 24, 2026)
2 episodes · May 18 – May 24
This Week in Cognitive Revolution (May 11 - May 17, 2026)
1 episode · May 11 – May 17
This Week in Cognitive Revolution (May 4 - May 10, 2026)
2 episodes · May 4 – May 10
This Week in Cognitive Revolution (Apr 20 - Apr 26, 2026)
2 episodes · Apr 20 – Apr 26
This Week in Cognitive Revolution (Apr 13 - Apr 19, 2026)
2 episodes · Apr 13 – Apr 19
This Week in Cognitive Revolution (Apr 6 - Apr 12, 2026)
2 episodes · Apr 6 – Apr 12
This Week in Cognitive Revolution (Mar 30 - Apr 5, 2026)
1 episode · Mar 30 – Apr 5
This Week in Cognitive Revolution (Mar 23 - Mar 29, 2026)
1 episode · Mar 23 – Mar 29