Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759
Episode
52 min
Read time
2 min
Topics
Health & Wellness, Fundraising & VC, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Pretraining for agents: Current models train on static benchmarks like GLUE or GSM8K, but agentic tasks require interactive environment capabilities. Pretraining must fundamentally change attention mechanisms, loss objectives, and data composition, not just rely on post-training fixes to achieve multistep reasoning and tool use.
- ✓Long context reasoning: Models need attention mechanisms that enable reasoning over millions of tokens while maintaining retrieval and synthesis capabilities. Current transformers struggle with multi-hop reasoning benchmarks like MRCR v2 and LOFT, even with long context windows, requiring architectural modifications for agent workflows.
- ✓Training data augmentation: Dominant pretraining sources like internet articles must be augmented with reasoning traces at comparable token volumes. Masking specific portions during training, similar to fill-in-the-middle for code models, teaches models which tools to use and how to plan across multiple steps.
- ✓Failure recovery capability: Models must learn to recognize failed trajectory steps in their context and choose different action spaces rather than repeating probabilistic mistakes. This requires both reinforcement learning objectives and pretraining formats that help models notice and correct from previous errors during multistep problem solving.
What It Covers
Aakanksha Chowdhery from Reflection explains why pretraining language models specifically for agentic capabilities requires rethinking attention mechanisms, loss objectives, and training data composition beyond current post-training approaches that optimize static benchmarks.
Key Questions Answered
- •Pretraining for agents: Current models train on static benchmarks like GLUE or GSM8K, but agentic tasks require interactive environment capabilities. Pretraining must fundamentally change attention mechanisms, loss objectives, and data composition, not just rely on post-training fixes to achieve multistep reasoning and tool use.
- •Long context reasoning: Models need attention mechanisms that enable reasoning over millions of tokens while maintaining retrieval and synthesis capabilities. Current transformers struggle with multi-hop reasoning benchmarks like MRCR v2 and LOFT, even with long context windows, requiring architectural modifications for agent workflows.
- •Training data augmentation: Dominant pretraining sources like internet articles must be augmented with reasoning traces at comparable token volumes. Masking specific portions during training, similar to fill-in-the-middle for code models, teaches models which tools to use and how to plan across multiple steps.
- •Failure recovery capability: Models must learn to recognize failed trajectory steps in their context and choose different action spaces rather than repeating probabilistic mistakes. This requires both reinforcement learning objectives and pretraining formats that help models notice and correct from previous errors during multistep problem solving.
Notable Moment
Chowdhery reveals that halfway through training PaLM, the team tested it on a crowdsourced reasoning benchmark and discovered a sudden step change in performance, indicating emergent reasoning capabilities that would not have been detected without diverse evaluation tasks beyond standard metrics.
You just read a 3-minute summary of a 49-minute episode.
Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The TWIML AI Podcast
Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Jun 9 · 51 min
Cognitive Revolution
Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures
Jun 3
More from The TWIML AI Podcast
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
May 21 · 66 min
Eye on AI
Training AI Models Without a Billion-Dollar Data Center | Steffen Cruz of Macrocosmos
May 25
More from The TWIML AI Podcast
We summarize every new episode. Want them in your inbox?
Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
How to Engineer AI Inference Systems with Philip Kiely - #766
How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765
Similar Episodes
Related episodes from other podcasts
Cognitive Revolution
Jun 3
Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures
Eye on AI
May 25
Training AI Models Without a Billion-Dollar Data Center | Steffen Cruz of Macrocosmos
Investing for Beginners
May 18
How AI Is Changing Investing— with David Trainer
Eye on AI
Apr 29
#340 Steffen Cruz: Training AI Without Data Centres
No Priors: Artificial Intelligence | Technology | Startups
Apr 17
Scaling Global Organizations in the Age of AI with ServiceNow CEO Bill McDermott
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into The TWIML AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime