What are the key takeaways from this This Week in Startups episode?

Key insights include: **Cyber Weapon Classification:** Mythos achieves roughly 59% on SWE-Bench multimodal versus 27% for Claude OS 4.6, doubling coding benchmark performance. Its ability to chain three to five independent vulnerabilities into a single sophisticated exploit makes it functionally a cyber weapon. Founders and security teams should treat any AI model with comparable code capabilities as an offensive tool requiring strict access controls, not just a productivity aid.; **Project Glasswing Defensive Window:** Anthropic allocated a $100 million compute credit fund for partners including AWS, Azure, and NVIDIA to harden critical infrastructure before Mythos releases publicly. Polymarket currently prices only a 28% chance of public release by June 30. Startups dependent on legacy open-source libraries like FFMPEG or OpenBSD should prioritize security audits now, using this three-to-five month window before equivalent capability spreads.; **SLM Cost Reduction Strategy:** AT&T reduced AI infrastructure token costs by 90% by routing 90% of workloads to small language models and reserving frontier models for the remaining 10% of complex tasks. At 8 billion tokens per day, this saved hundreds of thousands of dollars daily. Founders running high-volume, repetitive AI tasks should audit their OpenAI or Anthropic spend and identify which tasks a sub-10 billion parameter model can handle adequately.

What did Rob May discuss on This Week in Startups?

Anthropic's unreleased model Claude Mythos can autonomously chain multiple software vulnerabilities into sophisticated exploits, discovering more zero-day security flaws than human researchers find in careers. The episode covers the national security implications, Project Glasswing's defensive deployment with major tech partners, and the parallel rise of small language models as a cost-cutting alternative to frontier AI spending. Key topics include: **Cyber Weapon Classification:** Mythos achieves roughly 59% on SWE-Bench multimodal versus 27% for Claude OS 4.6, doubling coding benchmark performance. Its ability to chain three to five independent vulnerabilities into a single sophisticated exploit makes it functionally a cyber weapon. Founders and security teams should treat any AI model with comparable code capabilities as an offensive tool requiring strict access controls, not just a productivity aid.; **Project Glasswing Defensive Window:** Anthropic allocated a $100 million compute credit fund for partners including AWS, Azure, and NVIDIA to harden critical infrastructure before Mythos releases publicly. Polymarket currently prices only a 28% chance of public release by June 30. Startups dependent on legacy open-source libraries like FFMPEG or OpenBSD should prioritize security audits now, using this three-to-five month window before equivalent capability spreads..

How long is this episode of This Week in Startups?

This episode is 76 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

This Week in Startups

Anthropic’s Mythos is a cyber-weapon, so you can’t have it | E2273

April 9, 2026

76 min episode · 3 min read

Rob May

Episode

76 min

Read time

3 min

Topics

Career Growth, Productivity, Relationships

AI-Generated Summary

Published Apr 9, 2026

Key Takeaways

✓Cyber Weapon Classification: Mythos achieves roughly 59% on SWE-Bench multimodal versus 27% for Claude OS 4.6, doubling coding benchmark performance. Its ability to chain three to five independent vulnerabilities into a single sophisticated exploit makes it functionally a cyber weapon. Founders and security teams should treat any AI model with comparable code capabilities as an offensive tool requiring strict access controls, not just a productivity aid.
✓Project Glasswing Defensive Window: Anthropic allocated a $100 million compute credit fund for partners including AWS, Azure, and NVIDIA to harden critical infrastructure before Mythos releases publicly. Polymarket currently prices only a 28% chance of public release by June 30. Startups dependent on legacy open-source libraries like FFMPEG or OpenBSD should prioritize security audits now, using this three-to-five month window before equivalent capability spreads.
✓SLM Cost Reduction Strategy: AT&T reduced AI infrastructure token costs by 90% by routing 90% of workloads to small language models and reserving frontier models for the remaining 10% of complex tasks. At 8 billion tokens per day, this saved hundreds of thousands of dollars daily. Founders running high-volume, repetitive AI tasks should audit their OpenAI or Anthropic spend and identify which tasks a sub-10 billion parameter model can handle adequately.
✓Distillation as Cost Arbitrage: Companies can train task-specific small language models by capturing frontier model input-output pairs as a training dataset, a process called distillation. If a task runs thousands of times daily, such as extracting stock symbols from industry reports, distilling a dedicated SLM can cut per-task inference costs by up to 90%. The break-even threshold is roughly when the same prompt-response pattern repeats at scale daily.
✓Startup Defensibility Scoring Framework: Three categories consistently produce low AI-replaceability scores: physical hardware products, genuine network effects where value scales with user count, and deeply regulated industries requiring human relationships. Software products functioning as AI wrappers around frontier models score highest for replaceability. Founders should stress-test their product by asking whether a 31-line Claude prompt could replicate their core function before raising or scaling.

What It Covers

Anthropic's unreleased model Claude Mythos can autonomously chain multiple software vulnerabilities into sophisticated exploits, discovering more zero-day security flaws than human researchers find in careers. The episode covers the national security implications, Project Glasswing's defensive deployment with major tech partners, and the parallel rise of small language models as a cost-cutting alternative to frontier AI spending.

Key Questions Answered

•Cyber Weapon Classification: Mythos achieves roughly 59% on SWE-Bench multimodal versus 27% for Claude OS 4.6, doubling coding benchmark performance. Its ability to chain three to five independent vulnerabilities into a single sophisticated exploit makes it functionally a cyber weapon. Founders and security teams should treat any AI model with comparable code capabilities as an offensive tool requiring strict access controls, not just a productivity aid.
•Project Glasswing Defensive Window: Anthropic allocated a $100 million compute credit fund for partners including AWS, Azure, and NVIDIA to harden critical infrastructure before Mythos releases publicly. Polymarket currently prices only a 28% chance of public release by June 30. Startups dependent on legacy open-source libraries like FFMPEG or OpenBSD should prioritize security audits now, using this three-to-five month window before equivalent capability spreads.
•SLM Cost Reduction Strategy: AT&T reduced AI infrastructure token costs by 90% by routing 90% of workloads to small language models and reserving frontier models for the remaining 10% of complex tasks. At 8 billion tokens per day, this saved hundreds of thousands of dollars daily. Founders running high-volume, repetitive AI tasks should audit their OpenAI or Anthropic spend and identify which tasks a sub-10 billion parameter model can handle adequately.
•Distillation as Cost Arbitrage: Companies can train task-specific small language models by capturing frontier model input-output pairs as a training dataset, a process called distillation. If a task runs thousands of times daily, such as extracting stock symbols from industry reports, distilling a dedicated SLM can cut per-task inference costs by up to 90%. The break-even threshold is roughly when the same prompt-response pattern repeats at scale daily.
•Startup Defensibility Scoring Framework: Three categories consistently produce low AI-replaceability scores: physical hardware products, genuine network effects where value scales with user count, and deeply regulated industries requiring human relationships. Software products functioning as AI wrappers around frontier models score highest for replaceability. Founders should stress-test their product by asking whether a 31-line Claude prompt could replicate their core function before raising or scaling.
•Harness Engineering for SLM Reliability: Small language models lose task focus on complex multi-step workflows, but wrapping them in structured harnesses that require the model to check back against the original objective at each step dramatically improves reliability. Claude Code can generate these harnesses for specific tasks. Teams using SLMs for agentic workflows should build explicit checkpoint logic into their orchestration layer rather than relying on the model's native instruction-following alone.

Notable Moment

During a live demo of the tool Death by Claude, guest Gianni revealed his own startup scored 92 out of 100 on replaceability, meaning the product he built to survive AI disruption was itself declared nearly dead by AI. The tool then generated a 31-line prompt to replace his entire company.

Know someone who'd find this useful?

You just read a 3-minute summary of a 73-minute episode.

Get This Week in Startups summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Anthropic’s Cybersecurity Shock Wave + Ronan Farrow and Andrew Marantz on Their Sam Altman Investigation + One Good Thing

Apr 10

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links.

Tools

Death by Claude
by Gianni
“During a live demo of the tool Death by Claude, guest Gianni revealed his own startup scored 92 out of 100 on replaceability”

Similar Episodes

Related episodes from other podcasts

Hard Fork

May 15

A.I. Safety Is So Back + Mythos Mayhem with Nikesh Arora + Hot Mess Express

Hard Fork

Apr 10

Anthropic’s Cybersecurity Shock Wave + Ronan Farrow and Andrew Marantz on Their Sam Altman Investigation + One Good Thing

The AI Breakdown

Jun 29

Explore Related Topics

📊Career Growth ⚡Productivity 💕Relationships

This podcast is featured in Best Startup Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into This Week in Startups.

Every Monday, we deliver AI summaries of the latest episodes from This Week in Startups and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Anthropic’s Mythos is a cyber-weapon, so you can’t have it | E2273

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

$100T is managed by “human duct tape” | E2308

A.I. Safety Is So Back + Mythos Mayhem with Nikesh Arora + Hot Mess Express

Why the VC Hype Cycle Always Gets It Wrong | VC Roundtable | E2307

Anthropic’s Cybersecurity Shock Wave + Ronan Farrow and Andrew Marantz on Their Sam Altman Investigation + One Good Thing

Books, tools, and gear mentioned in this episode

Tools

More from This Week in Startups

$100T is managed by “human duct tape” | E2308

Why the VC Hype Cycle Always Gets It Wrong | VC Roundtable | E2307

Chamath on why young people need more agency, risk, and adventure

Why F1 Teams are Replacing Wind Tunnels with Smart Tape | E2305

Why the Future of Video Games is Moving Back to the Dinner Table

Similar Episodes

A.I. Safety Is So Back + Mythos Mayhem with Nikesh Arora + Hot Mess Express

Anthropic’s Cybersecurity Shock Wave + Ronan Farrow and Andrew Marantz on Their Sam Altman Investigation + One Good Thing

Mythos Comes Back But Not for Everyone

AI:AM #3: Zvi on Fable, the Cases For & Against the Ban, + AI for Math, Logistics & More

Was the Mythos Ban Justified? (Good Idea. Bad Execution.) | AI Reality Check

Explore Related Topics

You're clearly into This Week in Startups.