What are the key takeaways from this How I AI episode?

Key insights include: **Greenfield vs. existing code:** Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.; **Hallucination risk under confidence:** Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.; **Strategy work: 4.7 outperforms 4.8:** Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.

How long is this episode of How I AI?

This episode is 13 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

How I AI

Claude Opus 4.8 is here. Is it as good as they say?

May 28, 2026

13 min episode · 2 min read

Episode

13 min

Read time

2 min

Topics

Productivity, Artificial Intelligence, Software Development

AI-Generated Summary

Published May 29, 2026

Key Takeaways

✓Greenfield vs. existing code: Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.
✓Hallucination risk under confidence: Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.
✓Strategy work: 4.7 outperforms 4.8: Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.
✓New agentic infrastructure worth testing: Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents. Claude.ai and CoWork gain effort control settings from low to max. These harness-level changes may offset model limitations when prompting strategies are tuned appropriately.

What It Covers

Claire Vo shares early hands-on testing of Anthropic's Claude Opus 4.8, a coding-focused agent model priced at $5/$25 per million tokens, evaluating its performance across greenfield coding, existing codebases, and business strategy tasks.

Key Questions Answered

•Greenfield vs. existing code: Opus 4.8 performs well on one-shot, net-new feature builds — it planned and autonomously coded a full prototyping tool in roughly 20 minutes — but degrades significantly when navigating existing codebases, rebasing branches, or resolving edge-case bugs.
•Hallucination risk under confidence: Despite running on high-effort mode, Opus 4.8 fabricated conclusions from hypotheses rather than validated data, both in coding and strategy contexts. Treat high-confidence outputs with skepticism and explicitly prompt it to verify sources before accepting results.
•Strategy work: 4.7 outperforms 4.8: Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps. For data-driven strategy tasks, 4.7 remains the stronger choice.
•New agentic infrastructure worth testing: Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents. Claude.ai and CoWork gain effort control settings from low to max. These harness-level changes may offset model limitations when prompting strategies are tuned appropriately.

Notable Moment

During a fun test asking Opus 4.8 to build a game and then play it autonomously to tune difficulty for a nine-year-old, the model generated a workable but unambitious result — repeatedly falling short despite explicit prompts to push further.

Know someone who'd find this useful?

You just read a 3-minute summary of a 10-minute episode.

Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links.

Tools

Claude Code
by Anthropic
“Claude Code now supports dynamic workflows enabling hundreds of parallel sub-agents.”
Claude AI
by Anthropic
“Claude.ai and CoWork gain effort control settings from low to max.”
Claude Opus 4.8
by Anthropic
“Claire Vo shares early hands-on testing of Anthropic's Claude Opus 4.8, a coding-focused agent model priced at $5/$25 per million tokens, evaluating its performance across greenfield coding, existing codebases, and business strategy tasks.”
Claude Opus 4.7
by Anthropic
“Side-by-side testing on a business strategy prompt showed Opus 4.7 anchored responses in specific numbers and structured data, while 4.8 produced vague, hand-wavy roadmaps.”
CoWork
“Claude.ai and CoWork gain effort control settings from low to max.”

Similar Episodes

Related episodes from other podcasts

The AI Breakdown

Apr 20

Explore Related Topics

⚡Productivity 🤖Artificial Intelligence 💻Software Development

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into How I AI.

Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Claude Opus 4.8 is here. Is it as good as they say?

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

This solo builder runs 24/7 local AI on his own hardware | Alex Finn

What To Build First With Claude Design

GPT-5.6 Sol vs. Claude Fable: Why OpenAI’s new model crushes my benchmark

AMA Part 1: Is Claude Code AGI? Are we in a bubble? Plus Live Player Analysis

Books, tools, and gear mentioned in this episode

Tools

More from How I AI

This solo builder runs 24/7 local AI on his own hardware | Alex Finn

GPT-5.6 Sol vs. Claude Fable: Why OpenAI’s new model crushes my benchmark

What a harness is and how to build one with Claude Agent SDK

How I run autonomous coding agents from my phone with OpenAI Symphony + Linear | Alessio Fanelli (Kernel Labs)

Sonnet 5 review: I ran 64 generations to find out if it's worth it

Similar Episodes

What To Build First With Claude Design

AMA Part 1: Is Claude Code AGI? Are we in a bubble? Plus Live Player Analysis

699: Not the Correct Squircle

Socialists Sweep NYC, China Catches Up in Coding, AI Memory Crunch, Micron's Blowout Quarter

Why AI Users Are Raving About GLM 5.2

Explore Related Topics

You're clearly into How I AI.