Claude Opus 4.8 is here. Is it as good as they say?
Episode
13 min
Read time
2 min
AI-Generated Summary
Key Takeaways
- ✓Greenfield vs. existing code: Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.
- ✓Hallucination risk under confidence: Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.
- ✓Strategy work: 4.7 outperforms 4.8: Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.
- ✓New agentic infrastructure worth testing: Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents. Claude.ai and CoWork gain effort control settings from low to max. These harness-level changes may offset model limitations when prompting strategies are tuned appropriately.
What It Covers
Claire Vo shares early hands-on testing of Anthropic's Claude Opus 4.8, a coding-focused agent model priced at $5/$25 per million tokens, evaluating its performance across greenfield coding, existing codebases, and business strategy tasks.
Key Questions Answered
- •Greenfield vs. existing code: Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.
- •Hallucination risk under confidence: Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.
- •Strategy work: 4.7 outperforms 4.8: Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.
- •New agentic infrastructure worth testing: Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents. Claude.ai and CoWork gain effort control settings from low to max. These harness-level changes may offset model limitations when prompting strategies are tuned appropriately.
Notable Moment
During a fun test asking Opus 4.8 to build a game and then play it autonomously to tune difficulty for a nine-year-old, the model generated a workable but unambitious result — repeatedly falling short despite explicit prompts to push further.
You just read a 3-minute summary of a 10-minute episode.
Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from How I AI
The Codex feature that works while you sleep
May 27 · 30 min
Up First (NPR)
Israel Ramps Up Attacks Amid Iran Talks, E. Jean Carroll Investigation, CBS Overhaul
May 29
More from How I AI
How the engineer behind Claude Cowork actually uses Claude | Felix Rieseberg (Anthropic)
May 25 · 59 min
The Daily (NYT)
Stranded in the Strait of Hormuz
May 29
More from How I AI
We summarize every new episode. Want them in your inbox?
The Codex feature that works while you sleep
How the engineer behind Claude Cowork actually uses Claude | Felix Rieseberg (Anthropic)
What launched at Google I/O 2026 (30-minute day 1 recap)
HTML is the new Markdown: How Anthropic engineers are building with Claude Code | Thariq Shihipar
Spec-driven development: The AI engineering workflow at Notion | Ryan Nystrom
Similar Episodes
Related episodes from other podcasts
Up First (NPR)
May 29
Israel Ramps Up Attacks Amid Iran Talks, E. Jean Carroll Investigation, CBS Overhaul
The Daily (NYT)
May 29
Stranded in the Strait of Hormuz
10% Happier with Dan Harris
May 29
Anxiety Narrows Your Brain. Here's How to Widen It Back Out. | Susa Talan
Feel Better, Live More
May 28
BITESIZE | The 5 Minute Habits That Can Transform Your Health | Dr Rangan Chatterjee and Dr Ayan Panja #661
The Tim Ferriss Show
May 28
#867: Dr. Becky Kennedy — Parenting Strategies for Raising Resilient Kids, Plus Word-for-Word Scripts for Repairing Relationships, Setting Boundaries, and More (Repost)
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into How I AI.
Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime