Anthropic Accidentally Revealed Their Most Powerful Model Ever

March 27, 2026

27 min episode · 2 min read

Episode

27 min

Read time

2 min

Topics

Artificial Intelligence

AI-Generated Summary

Published Mar 28, 2026

Key Takeaways

✓Vertical Model Performance: Intercom's Apex model, built on domain-specific post-training using billions of customer service interactions, achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5. Companies with dense, labeled interaction data may hold untapped fine-tuning assets that outperform general-purpose frontier models in narrow tasks.
✓Post-Training as the New Moat: Cursor's Composer Two, built on open-weight Kimi K2.5 with reinforcement learning applied using proprietary coding interaction data, matched GPT-4.5 and beat Opus 4.6 on coding benchmarks at lower cost. This suggests that 75% of model performance gains can come from post-training rather than pretraining compute alone.
✓The Bitter Lesson Reframed: Computer scientist Rich Sutton's 1999 essay argues brute-force compute beats human-encoded knowledge every time. However, Sutton himself later clarified that systems trained on real-world experience, not human expert knowledge, represent the next phase, which is precisely what Apex and Composer Two demonstrate through interaction-derived training data.
✓Full-Stack AI as Competitive Necessity: Intercom's CPO argues that durable differentiation in AI products will migrate down the stack from application layer to model layer as app-layer features become easier to clone. Companies with sufficient labeled interaction data should evaluate whether proprietary post-training pipelines can reduce API dependency and improve task-specific performance simultaneously.
✓Claude Mythos Leak Details: An unsecured Anthropic database exposed a draft blog post describing Claude Mythos as a new tier above Opus, with dramatically higher scores in coding, academic reasoning, and cybersecurity benchmarks. Anthropic confirmed the model exists, flagged cybersecurity risks requiring extra caution, and noted it is computationally expensive, with no general release timeline announced.

What It Covers

Anthropic's accidental leak reveals Claude Mythos, a model surpassing their Opus tier, while Intercom and Cursor demonstrate that domain-specific post-training on proprietary interaction data can outperform frontier models, signaling a structural shift toward vertical AI specialization across enterprise software.

Key Questions Answered

•Vertical Model Performance: Intercom's Apex model, built on domain-specific post-training using billions of customer service interactions, achieves a 2.8% higher resolution rate, 65% fewer hallucinations, and lower cost than GPT-4.5 and Claude Opus 4.5. Companies with dense, labeled interaction data may hold untapped fine-tuning assets that outperform general-purpose frontier models in narrow tasks.
•Post-Training as the New Moat: Cursor's Composer Two, built on open-weight Kimi K2.5 with reinforcement learning applied using proprietary coding interaction data, matched GPT-4.5 and beat Opus 4.6 on coding benchmarks at lower cost. This suggests that 75% of model performance gains can come from post-training rather than pretraining compute alone.
•The Bitter Lesson Reframed: Computer scientist Rich Sutton's 1999 essay argues brute-force compute beats human-encoded knowledge every time. However, Sutton himself later clarified that systems trained on real-world experience, not human expert knowledge, represent the next phase, which is precisely what Apex and Composer Two demonstrate through interaction-derived training data.
•Full-Stack AI as Competitive Necessity: Intercom's CPO argues that durable differentiation in AI products will migrate down the stack from application layer to model layer as app-layer features become easier to clone. Companies with sufficient labeled interaction data should evaluate whether proprietary post-training pipelines can reduce API dependency and improve task-specific performance simultaneously.
•Claude Mythos Leak Details: An unsecured Anthropic database exposed a draft blog post describing Claude Mythos as a new tier above Opus, with dramatically higher scores in coding, academic reasoning, and cybersecurity benchmarks. Anthropic confirmed the model exists, flagged cybersecurity risks requiring extra caution, and noted it is computationally expensive, with no general release timeline announced.

Notable Moment

Decagon revealed that over 80% of its model traffic now runs on internally trained models structured as a network of specialized components, each handling a distinct interaction layer, detection, orchestration, response generation, and evaluation, optimized independently rather than relying on a single frontier model API.

Know someone who'd find this useful?