Anthropic Accidentally Revealed Their Most Powerful Model Ever
Episode
27 min
Read time
2 min
Topics
Fundraising & VC, Artificial Intelligence, Software Development
AI-Generated Summary
Key Takeaways
- ✓Vertical Model Performance: Intercom's Apex model, built on domain-specific post-training using billions of customer service interactions, achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5. Companies with dense, labeled interaction data may hold untapped fine-tuning assets that outperform general-purpose frontier models in narrow tasks.
- ✓Post-Training as the New Moat: Cursor's Composer Two, built on open-weight Kimi K2.5 with reinforcement learning applied using proprietary coding interaction data, matched GPT-4.5 and beat Opus 4.6 on coding benchmarks at lower cost. This suggests that 75% of model performance gains can come from post-training rather than pretraining compute alone.
- ✓The Bitter Lesson Reframed: Computer scientist Rich Sutton's 1999 essay argues brute-force compute beats human-encoded knowledge every time. However, Sutton himself later clarified that systems trained on real-world experience, not human expert knowledge, represent the next phase, which is precisely what Apex and Composer Two demonstrate through interaction-derived training data.
- ✓Full-Stack AI as Competitive Necessity: Intercom's CPO argues that durable differentiation in AI products will migrate down the stack from application layer to model layer as app-layer features become easier to clone. Companies with sufficient labeled interaction data should evaluate whether proprietary post-training pipelines can reduce API dependency and improve task-specific performance simultaneously.
- ✓Claude Mythos Leak Details: An unsecured Anthropic database exposed a draft blog post describing Claude Mythos as a new tier above Opus, with dramatically higher scores in coding, academic reasoning, and cybersecurity benchmarks. Anthropic confirmed the model exists, flagged cybersecurity risks requiring extra caution, and noted it is computationally expensive, with no general release timeline announced.
What It Covers
Anthropic's accidental leak reveals Claude Mythos, a model surpassing their Opus tier, while Intercom and Cursor demonstrate that domain-specific post-training on proprietary interaction data can outperform frontier models, signaling a structural shift toward vertical AI specialization across enterprise software.
Key Questions Answered
- •Vertical Model Performance: Intercom's Apex model, built on domain-specific post-training using billions of customer service interactions, achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5. Companies with dense, labeled interaction data may hold untapped fine-tuning assets that outperform general-purpose frontier models in narrow tasks.
- •Post-Training as the New Moat: Cursor's Composer Two, built on open-weight Kimi K2.5 with reinforcement learning applied using proprietary coding interaction data, matched GPT-4.5 and beat Opus 4.6 on coding benchmarks at lower cost. This suggests that 75% of model performance gains can come from post-training rather than pretraining compute alone.
- •The Bitter Lesson Reframed: Computer scientist Rich Sutton's 1999 essay argues brute-force compute beats human-encoded knowledge every time. However, Sutton himself later clarified that systems trained on real-world experience, not human expert knowledge, represent the next phase, which is precisely what Apex and Composer Two demonstrate through interaction-derived training data.
- •Full-Stack AI as Competitive Necessity: Intercom's CPO argues that durable differentiation in AI products will migrate down the stack from application layer to model layer as app-layer features become easier to clone. Companies with sufficient labeled interaction data should evaluate whether proprietary post-training pipelines can reduce API dependency and improve task-specific performance simultaneously.
- •Claude Mythos Leak Details: An unsecured Anthropic database exposed a draft blog post describing Claude Mythos as a new tier above Opus, with dramatically higher scores in coding, academic reasoning, and cybersecurity benchmarks. Anthropic confirmed the model exists, flagged cybersecurity risks requiring extra caution, and noted it is computationally expensive, with no general release timeline announced.
Notable Moment
Decagon revealed that over 80% of its model traffic now runs on internally trained models structured as a network of specialized components, each handling a distinct interaction layer, detection, orchestration, response generation, and evaluation, optimized independently rather than relying on a single frontier model API.
You just read a 3-minute summary of a 24-minute episode.
Get The AI Breakdown summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The AI Breakdown
CEO-Led AI Gets 3X the ROI
Jun 25 · 30 min
Cognitive Revolution
AI:AM #3: Zvi on Fable, the Cases For & Against the Ban, + AI for Math, Logistics & More
Jun 21
More from The AI Breakdown
5 Ways Claude Tag Could Change How You Use AI
Jun 24 · 27 min
Hard Fork
A.I. Safety Is So Back + Mythos Mayhem with Nikesh Arora + Hot Mess Express
May 15
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Books
- The Bitter LessonBy guest
by Rich Sutton
“Computer scientist Rich Sutton's 1999 essay argues brute-force compute beats human-encoded knowledge every time.”
Tools
- Claude OpusBy guest
by Anthropic
“Intercom's Apex model...achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5.”
“Cursor's Composer Two, built on open-weight Kimi K2.5 with reinforcement learning applied using proprietary coding interaction data.”
- Claude MythosBy guest
by Anthropic
“Anthropic's accidental leak reveals Claude Mythos, a model surpassing their Opus tier...An unsecured Anthropic database exposed a draft blog post describing Claude Mythos as a new tier above Opus.”
- Intercom ApexBy guest
by Intercom
“Intercom's Apex model, built on domain-specific post-training using billions of customer service interactions, achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5.”
- Cursor Composer TwoBy guest
by Cursor
“Cursor's Composer Two, built on open-weight Kimi K2.5 with reinforcement learning applied using proprietary coding interaction data, matched GPT-4.5 and beat Opus 4.6 on coding benchmarks at lower cost.”
- GPT-4.5By guest
by OpenAI
“Intercom's Apex model...achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5.”
More from The AI Breakdown
We summarize every new episode. Want them in your inbox?
Similar Episodes
Related episodes from other podcasts
Cognitive Revolution
Jun 21
AI:AM #3: Zvi on Fable, the Cases For & Against the Ban, + AI for Math, Logistics & More
Hard Fork
May 15
A.I. Safety Is So Back + Mythos Mayhem with Nikesh Arora + Hot Mess Express
Deep Questions with Cal Newport
Jun 17
Was the Mythos Ban Justified? (Good Idea. Bad Execution.) | AI Reality Check
How I AI
Jun 9
Claude Fable 5 review: what the new Mythos model gets right (and very wrong)
Accidental Tech Podcast
Apr 30
689: The Positive Effect of Enthusiasm
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into The AI Breakdown.
Every Monday, we deliver AI summaries of the latest episodes from The AI Breakdown and 192+ other podcasts. Free for one show.
Start My Monday DigestNo credit card · Unsubscribe anytime