Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

December 17, 2025

52 min episode · 2 min read

Aakanksha Chowdhery

Episode

52 min

Read time

2 min

Topics

Artificial Intelligence

AI-Generated Summary

Published Dec 31, 2025

Key Takeaways

✓Pretraining for agents: Current models train on static benchmarks like GLUE or GSM8K, but agentic tasks require interactive environment capabilities. Pretraining must fundamentally change attention mechanisms, loss objectives, and data composition, not just rely on post-training fixes to achieve multistep reasoning and tool use.
✓Long context reasoning: Models need attention mechanisms that enable reasoning over millions of tokens while maintaining retrieval and synthesis capabilities. Current transformers struggle with multi-hop reasoning benchmarks like MRCR v2 and LOFT, even with long context windows, requiring architectural modifications for agent workflows.
✓Training data augmentation: Dominant pretraining sources like internet articles must be augmented with reasoning traces at comparable token volumes. Masking specific portions during training, similar to fill-in-the-middle for code models, teaches models which tools to use and how to plan across multiple steps.
✓Failure recovery capability: Models must learn to recognize failed trajectory steps in their context and choose different action spaces rather than repeating probabilistic mistakes. This requires both reinforcement learning objectives and pretraining formats that help models notice and correct from previous errors during multistep problem solving.

What It Covers

Aakanksha Chowdhery from Reflection explains why pretraining language models specifically for agentic capabilities requires rethinking attention mechanisms, loss objectives, and training data composition beyond current post-training approaches that optimize static benchmarks.

Key Questions Answered

•Pretraining for agents: Current models train on static benchmarks like GLUE or GSM8K, but agentic tasks require interactive environment capabilities. Pretraining must fundamentally change attention mechanisms, loss objectives, and data composition, not just rely on post-training fixes to achieve multistep reasoning and tool use.
•Long context reasoning: Models need attention mechanisms that enable reasoning over millions of tokens while maintaining retrieval and synthesis capabilities. Current transformers struggle with multi-hop reasoning benchmarks like MRCR v2 and LOFT, even with long context windows, requiring architectural modifications for agent workflows.
•Training data augmentation: Dominant pretraining sources like internet articles must be augmented with reasoning traces at comparable token volumes. Masking specific portions during training, similar to fill-in-the-middle for code models, teaches models which tools to use and how to plan across multiple steps.
•Failure recovery capability: Models must learn to recognize failed trajectory steps in their context and choose different action spaces rather than repeating probabilistic mistakes. This requires both reinforcement learning objectives and pretraining formats that help models notice and correct from previous errors during multistep problem solving.

Notable Moment

Chowdhery reveals that halfway through training PaLM, the team tested it on a crowdsourced reasoning benchmark and discovered a sudden step change in performance, indicating emergent reasoning capabilities that would not have been detected without diverse evaluation tasks beyond standard metrics.

Know someone who'd find this useful?

You just read a 3-minute summary of a 49-minute episode.

Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

Masters of Scale

Apr 25

Explore Related Topics

🤖Artificial Intelligence

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into The TWIML AI Podcast.

Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

The Defense Tech Startup YC Kicked Out of a Meeting is Now Arming America | E2280

More from The TWIML AI Podcast

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

Similar Episodes

Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers

The Defense Tech Startup YC Kicked Out of a Meeting is Now Arming America | E2280

When does AI become a spending suck?

This guy built a $1B+ brand in 3 years. The product? You'd never guess

#338 Amith Singhee: Can India Catch Up in AI? IBM's Amith Singhee on What It Will Take

Explore Related Topics

You're clearly into The TWIML AI Podcast.