#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
Read time
2 min
Topics
Productivity, Remote Work, Investing
AI-Generated Summary
Key Takeaways
- ✓Chinese Open Model Strategy: DeepSeek trained their model for approximately $5 million at cloud rates, while Olmo 3 spent around $2 million for cluster rental including engineering issues. Chinese companies release open weight models primarily to gain international distribution where users won't pay for API subscriptions to Chinese services due to security concerns, creating influence through free access rather than direct revenue.
- ✓Pretraining Cost Economics: Training costs represent a small fraction compared to serving costs for hundreds of millions of users. A thousand GPU rental costs roughly $100 daily, while frontier labs operate millions of GPUs. Companies now optimize for smaller, more efficient models because recurring serving costs reach billions of dollars, making model size reduction more valuable than raw capability gains through larger pretraining runs.
- ✓Reinforcement Learning Scaling: Post-training through reinforcement learning with verifiable rewards unlocked major capability gains in 2025, enabling tool use, multi-step reasoning, and better code generation. AI2's November model used five days of RL training, then ran another 3.5 weeks in December for notable improvements, demonstrating that RL scaling provides more cost-effective intelligence gains than expanding pretraining compute at current model sizes.
- ✓Data Quality Over Quantity: Olmo 3 achieved better performance with less training data than predecessors by focusing on data quality and mixing ratios. Labs train classifiers on samples from different sources like GitHub, Stack Exchange, and Wikipedia, then use linear regression to determine optimal dataset composition based on target evaluations. Synthetic data includes OCR extraction from PDFs yielding trillions of tokens, not just AI-generated content.
- ✓Architecture Convergence: Modern frontier models remain fundamentally similar to GPT-2 architecture with incremental tweaks like mixture of experts, multi-head latent attention, and group query attention. The differentiation comes from systems optimization including FP8 and FP4 training, distributed compute management across 10,000-100,000 GPUs, and post-training algorithms rather than novel architectural paradigms. Converting between model architectures requires only adding specific components to the base transformer.
What It Covers
Sebastian Raschka and Nathan Lambert analyze the 2025 AI landscape following DeepSeek's breakthrough, comparing Chinese and US model development, examining scaling laws across pretraining and inference, discussing open versus closed models, and evaluating the technical architecture evolution from GPT-2 to current frontier models like Claude Opus 4.5 and GPT-5.
Key Questions Answered
- •Chinese Open Model Strategy: DeepSeek trained their model for approximately $5 million at cloud rates, while Olmo 3 spent around $2 million for cluster rental including engineering issues. Chinese companies release open weight models primarily to gain international distribution where users won't pay for API subscriptions to Chinese services due to security concerns, creating influence through free access rather than direct revenue.
- •Pretraining Cost Economics: Training costs represent a small fraction compared to serving costs for hundreds of millions of users. A thousand GPU rental costs roughly $100 daily, while frontier labs operate millions of GPUs. Companies now optimize for smaller, more efficient models because recurring serving costs reach billions of dollars, making model size reduction more valuable than raw capability gains through larger pretraining runs.
- •Reinforcement Learning Scaling: Post-training through reinforcement learning with verifiable rewards unlocked major capability gains in 2025, enabling tool use, multi-step reasoning, and better code generation. AI2's November model used five days of RL training, then ran another 3.5 weeks in December for notable improvements, demonstrating that RL scaling provides more cost-effective intelligence gains than expanding pretraining compute at current model sizes.
- •Data Quality Over Quantity: Olmo 3 achieved better performance with less training data than predecessors by focusing on data quality and mixing ratios. Labs train classifiers on samples from different sources like GitHub, Stack Exchange, and Wikipedia, then use linear regression to determine optimal dataset composition based on target evaluations. Synthetic data includes OCR extraction from PDFs yielding trillions of tokens, not just AI-generated content.
- •Architecture Convergence: Modern frontier models remain fundamentally similar to GPT-2 architecture with incremental tweaks like mixture of experts, multi-head latent attention, and group query attention. The differentiation comes from systems optimization including FP8 and FP4 training, distributed compute management across 10,000-100,000 GPUs, and post-training algorithms rather than novel architectural paradigms. Converting between model architectures requires only adding specific components to the base transformer.
Notable Moment
Nathan Lambert reveals he exclusively uses extended thinking modes across multiple models, running five simultaneous GPT-5 pro queries for different research tasks like finding papers or checking equations. He finds the non-thinking GPT-5 model has higher error rates and poor tone, refusing to use it despite speed advantages, demonstrating how power users prioritize marginal intelligence gains over convenience.
Get Lex Fridman Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Lex Fridman Podcast
#497 – Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE – Don Lincoln
May 29 · 181 min
Latent Space
[LIVE] Anthropic Distillation & How Models Cheat (SWE-Bench Dead) | Nathan Lambert & Sebastian Raschka
Feb 26
More from Lex Fridman Podcast
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet
May 6 · 263 min
The TWIML AI Podcast
AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762
Feb 26
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
“SPONSORS: Fin”
“SPONSORS: Box”
“SPONSORS: Quo”
“SPONSORS: Perplexity”
“SPONSORS: Shopify”
“SPONSORS: CodeRabbit”
Gear
Products
“evaluating the technical architecture evolution from GPT-2 to current frontier models like Claude Opus 4.5 and GPT-5”
More from Lex Fridman Podcast
We summarize every new episode. Want them in your inbox?
#497 – Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE – Don Lincoln
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming
Similar Episodes
Related episodes from other podcasts
Latent Space
Feb 26
[LIVE] Anthropic Distillation & How Models Cheat (SWE-Bench Dead) | Nathan Lambert & Sebastian Raschka
The TWIML AI Podcast
Feb 26
AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762
The Daily (NYT)
Apr 15
Trump’s Risky Strategy to Blockade Iran’s Blockade
The Jordan Harbinger Show
Apr 12
1311: Online Gambling | Skeptical Sunday
20VC (20 Minute VC)
Apr 2
20VC: Anthropic's $6BN Revenue Month | OpenAI Kills Sora & Hits $100M ARR on Ads | Oura Going Public & Whoop Raises at $10BN | Manus Founders Trapped in China & The Billionaire Tax: Anyone Left in California?
Explore Related Topics
This podcast is featured in Best Tech Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Investing & Markets Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Lex Fridman Podcast.
Every Monday, we deliver AI summaries of the latest episodes from Lex Fridman Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime