AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762
Episode
78 min
Read time
3 min
Topics
Productivity, Investing, Startups
AI-Generated Summary
Key Takeaways
- ✓Post-training vs. pre-training R&D shift: Research teams are now concentrating resources on post-training techniques rather than pre-training because low-hanging fruit remains in reinforcement learning and reasoning pipelines. Pre-training is already highly optimized — more data and better data mixes yield diminishing returns — while post-training algorithms like GRPO still have significant room for improvement through relatively accessible algorithmic tweaks.
- ✓Verifiable rewards as the reasoning engine: DeepSeek R1's breakthrough relied on training models using math and code problems where correctness can be verified deterministically — using tools like SymPy for symbolic math comparison or code compilers. This eliminates the need for human evaluators, enabling generation and scoring of tens of thousands of answers cheaply. Extending verifiable rewards to domains like drug design or protein structure modeling is the next frontier.
- ✓Inference-time scaling via self-consistency and self-refinement: Two concrete techniques boost model accuracy without retraining. Self-consistency generates multiple answers at varied temperatures and selects via majority vote (best-of-N). Self-refinement feeds a model's output back to itself or another model with a rubric, prompting iterative correction. DeepSeek Math 3.2 demonstrated that cranking up both techniques enabled gold-level competition math performance from the same base model.
- ✓LLMs as tool-builders, not just task-doers: The highest-leverage use of LLMs for technical users is building deterministic workflow tools — native apps, scripts, custom web tools — rather than using LLMs for every task directly. Raschka built macOS apps for podcast chapter-mark insertion and metadata extraction from arXiv links. Charrington built a podcast analytics pipeline. Using LLMs to create deterministic tools avoids hallucination risk on repetitive structured tasks.
- ✓Agentic systems require model fine-tuning for multi-agent environments: Current agentic tools like OpenClaw and Claude Code use standard LLMs not specifically trained for multi-agent interaction. OpenAI's Codex backend is a fork of GPT-5.3 fine-tuned specifically for agentic coding tasks. Raschka predicts major labs will fine-tune dedicated agent models for multi-agent settings, similar to how Codex diverged from the base model, improving reliability in looped, tool-using pipelines.
What It Covers
Sebastian Raschka, independent LLM researcher, joins Sam Charrington to assess the LLM landscape in early 2026. They cover reasoning model advances, inference-time scaling techniques, the rise of agentic tools like OpenClaw, practical workflow automation using LLMs, and what to expect from post-training research through the rest of 2026.
Key Questions Answered
- •Post-training vs. pre-training R&D shift: Research teams are now concentrating resources on post-training techniques rather than pre-training because low-hanging fruit remains in reinforcement learning and reasoning pipelines. Pre-training is already highly optimized — more data and better data mixes yield diminishing returns — while post-training algorithms like GRPO still have significant room for improvement through relatively accessible algorithmic tweaks.
- •Verifiable rewards as the reasoning engine: DeepSeek R1's breakthrough relied on training models using math and code problems where correctness can be verified deterministically — using tools like SymPy for symbolic math comparison or code compilers. This eliminates the need for human evaluators, enabling generation and scoring of tens of thousands of answers cheaply. Extending verifiable rewards to domains like drug design or protein structure modeling is the next frontier.
- •Inference-time scaling via self-consistency and self-refinement: Two concrete techniques boost model accuracy without retraining. Self-consistency generates multiple answers at varied temperatures and selects via majority vote (best-of-N). Self-refinement feeds a model's output back to itself or another model with a rubric, prompting iterative correction. DeepSeek Math 3.2 demonstrated that cranking up both techniques enabled gold-level competition math performance from the same base model.
- •LLMs as tool-builders, not just task-doers: The highest-leverage use of LLMs for technical users is building deterministic workflow tools — native apps, scripts, custom web tools — rather than using LLMs for every task directly. Raschka built macOS apps for podcast chapter-mark insertion and metadata extraction from arXiv links. Charrington built a podcast analytics pipeline. Using LLMs to create deterministic tools avoids hallucination risk on repetitive structured tasks.
- •Agentic systems require model fine-tuning for multi-agent environments: Current agentic tools like OpenClaw and Claude Code use standard LLMs not specifically trained for multi-agent interaction. OpenAI's Codex backend is a fork of GPT-5.3 fine-tuned specifically for agentic coding tasks. Raschka predicts major labs will fine-tune dedicated agent models for multi-agent settings, similar to how Codex diverged from the base model, improving reliability in looped, tool-using pipelines.
- •Mixture-of-experts and multi-head latent attention define 2025-2026 architecture trends: DeepSeek V3's architecture — combining mixture-of-experts with multi-head latent attention (MLA) — became the dominant template, adopted by Kimi (scaled to 1 trillion parameters) and Mistral AI. MLA compresses key-value cache via low-rank projection (similar to LoRA), trading compute for memory efficiency. DeepSeek's sparse attention mechanism further reduces quadratic scaling costs, making these the practical production-proven architectural choices to watch.
Notable Moment
Raschka recounts attempting to add a dark mode to his personal website using an LLM, only to find that manually editing the CSS file himself was faster than iteratively prompting the model to reposition a misaligned button — illustrating that retained technical knowledge still outperforms LLM delegation on precise, structured tasks.
You just read a 3-minute summary of a 75-minute episode.
Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The TWIML AI Podcast
Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Jun 9 · 51 min
Modern Wisdom
The Masculinity Debate Is A Huge Mess - Richard Reeves - #1087
Apr 20
More from The TWIML AI Podcast
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
May 21 · 66 min
Lex Fridman Podcast
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
Feb 1
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
“Current agentic tools like OpenClaw and Claude Code use standard LLMs not specifically trained for multi-agent interaction.”
“DeepSeek R1's breakthrough relied on training models using math and code problems where correctness can be verified deterministically — using tools like SymPy for symbolic math comparison or code compilers.”
by Anthropic
“Current agentic tools like OpenClaw and Claude Code use standard LLMs not specifically trained for multi-agent interaction.”
More from The TWIML AI Podcast
We summarize every new episode. Want them in your inbox?
Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
How to Engineer AI Inference Systems with Philip Kiely - #766
How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765
Similar Episodes
Related episodes from other podcasts
Modern Wisdom
Apr 20
The Masculinity Debate Is A Huge Mess - Richard Reeves - #1087
Lex Fridman Podcast
Feb 1
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
Biotech Bulls & Breakthroughs
Aug 1
Biopharma Industry: Insights from BioSpace Senior Editor Annalee Armstrong
Freakonomics Radio
Jun 5
676. Has America Lost the Plot?
Pivot
May 26
Grading America's First 250 Years: America, Actually with Astead Herndon
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into The TWIML AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime