Skip to main content
How I AI

Claude Opus 4.8 is here. Is it as good as they say?

13 min episode · 2 min read

Episode

13 min

Read time

2 min

AI-Generated Summary

Key Takeaways

  • Greenfield vs. existing code: Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.
  • Hallucination risk under confidence: Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.
  • Strategy work: 4.7 outperforms 4.8: Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.
  • New agentic infrastructure worth testing: Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents. Claude.ai and CoWork gain effort control settings from low to max. These harness-level changes may offset model limitations when prompting strategies are tuned appropriately.

What It Covers

Claire Vo shares early hands-on testing of Anthropic's Claude Opus 4.8, a coding-focused agent model priced at $5/$25 per million tokens, evaluating its performance across greenfield coding, existing codebases, and business strategy tasks.

Key Questions Answered

  • Greenfield vs. existing code: Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.
  • Hallucination risk under confidence: Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.
  • Strategy work: 4.7 outperforms 4.8: Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.
  • New agentic infrastructure worth testing: Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents. Claude.ai and CoWork gain effort control settings from low to max. These harness-level changes may offset model limitations when prompting strategies are tuned appropriately.

Notable Moment

During a fun test asking Opus 4.8 to build a game and then play it autonomously to tune difficulty for a nine-year-old, the model generated a workable but unambitious result — repeatedly falling short despite explicit prompts to push further.

Know someone who'd find this useful?

You just read a 3-minute summary of a 10-minute episode.

Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from How I AI

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into How I AI.

Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime