Millions of books died so Claude could live
Episode
88 min
Read time
3 min
Topics
Relationships, Fundraising & VC, Sales & Revenue
AI-Generated Summary
Key Takeaways
- ✓AI Training Data Acquisition: Anthropic's Project Panama used hydraulic cutting machines to destructively scan physical books after initially downloading pirated shadow libraries like LibGen. The company hired Tom Turvey from Google Books, purchased hundreds of thousands of used books from warehouses like Better World Books at bulk prices, sliced off spines, and rapidly scanned pages to digitize content for Claude training.
- ✓Books as Quality Training Material: AI companies prioritize books over other content sources because published works provide higher quality, vetted material with coherent sentence structure and fact-checking. Anthropic viewed books as a competitive advantage to catch up with larger rivals like OpenAI and Google, with evidence suggesting Claude's reputation as the best writing chatbot may stem from this book-heavy training approach.
- ✓Legal Fair Use Paradox: Two judges ruled AI model training on books constitutes fair use, but companies face liability for how they acquired books initially. Anthropic settled for one point five billion dollars over books they scanned but never used in commercial models, while the actual training process was deemed legally acceptable. This creates a counterintuitive situation where illegal acquisition precedes legal usage.
- ✓Theatrical Revenue Decline Drivers: Civic Science surveyed two thousand moviegoers and found lack of interest in available movie types ranked as the top reason people avoid theaters, with cost ranking second. Average moviegoers now see fewer films monthly than in the early nineteen nineties, while supply of theatrical releases has steadily decreased, creating uncertainty about whether more films would increase attendance.
- ✓Nostalgia Screening Strategy: Studios could fill theatrical gaps by reprinting beloved films like Mean Girls or Nightmare Before Christmas, which perform well during limited runs with minimal reprint costs. This approach mirrors streaming's use of catalog content like Friends to maintain subscriber lifetime value, allowing exhibitors to pay operating costs while studios reserve expensive new productions for proven blockbuster opportunities.
What It Covers
The Vergecast examines how Anthropic and other AI companies train models using millions of books through Project Panama, involving destructive scanning and shadow libraries. The episode explores Netflix's theatrical strategy amid the Warner Brothers Discovery acquisition, questioning whether movie theaters can survive through nostalgia screenings and alternative programming rather than traditional releases.
Key Questions Answered
- •AI Training Data Acquisition: Anthropic's Project Panama used hydraulic cutting machines to destructively scan physical books after initially downloading pirated shadow libraries like LibGen. The company hired Tom Turvey from Google Books, purchased hundreds of thousands of used books from warehouses like Better World Books at bulk prices, sliced off spines, and rapidly scanned pages to digitize content for Claude training.
- •Books as Quality Training Material: AI companies prioritize books over other content sources because published works provide higher quality, vetted material with coherent sentence structure and fact-checking. Anthropic viewed books as a competitive advantage to catch up with larger rivals like OpenAI and Google, with evidence suggesting Claude's reputation as the best writing chatbot may stem from this book-heavy training approach.
- •Legal Fair Use Paradox: Two judges ruled AI model training on books constitutes fair use, but companies face liability for how they acquired books initially. Anthropic settled for one point five billion dollars over books they scanned but never used in commercial models, while the actual training process was deemed legally acceptable. This creates a counterintuitive situation where illegal acquisition precedes legal usage.
- •Theatrical Revenue Decline Drivers: Civic Science surveyed two thousand moviegoers and found lack of interest in available movie types ranked as the top reason people avoid theaters, with cost ranking second. Average moviegoers now see fewer films monthly than in the early nineteen nineties, while supply of theatrical releases has steadily decreased, creating uncertainty about whether more films would increase attendance.
- •Nostalgia Screening Strategy: Studios could fill theatrical gaps by reprinting beloved films like Mean Girls or Nightmare Before Christmas, which perform well during limited runs with minimal reprint costs. This approach mirrors streaming's use of catalog content like Friends to maintain subscriber lifetime value, allowing exhibitors to pay operating costs while studios reserve expensive new productions for proven blockbuster opportunities.
- •IKEA Smart Home Thread Problems: IKEA's six dollar Billreza buttons represent mass market thread adoption but expose system failures. Google Home still refuses to support matter buttons despite years of requests, while Amazon thread networks cannot merge with other thread border routers. Initial pairing issues and network disconnections plague IKEA's first wave of thread devices, requiring troubleshooting through multiple platforms.
Notable Moment
Will Oremus discovered internal documents showing an Anthropic executive previously downloaded the entire LibGen pirated book library while at OpenAI, then repeated the same process after cofounding Anthropic. The documents included browser screenshots with torrent sites open and LibGen partially downloaded, demonstrating how AI companies systematically used piracy as their starting point before developing physical book scanning operations.
You just read a 3-minute summary of a 85-minute episode.
Get The Vergecast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The Vergecast
Your biggest questions from Apple's WWDC
Jun 10 · 38 min
All-In with Chamath, Jason, Sacks & Friedberg
Trump-Xi Summit, Benioff: "Not My First SaaSpocalypse," OpenAI vs Apple, Multi-Sensory AI, El Niño
May 15
More from The Vergecast
How Steve Jobs became Steve Jobs
Jun 9 · 43 min
20VC (20 Minute VC)
20VC: Anthropic's Superbowl Ad: Who Won - Who Lost | Harvey Raises $200M at $11BN Valuation | Sierra Hits $150M in ARR: Is Customer Support Too Crowded
Feb 12
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Books
“Studios could fill theatrical gaps by reprinting beloved films like Mean Girls or Nightmare Before Christmas, which perform well during limited runs with minimal reprint costs.”
“Studios could fill theatrical gaps by reprinting beloved films like Mean Girls or Nightmare Before Christmas, which perform well during limited runs with minimal reprint costs.”
Tools
by Google
“Google Home still refuses to support matter buttons despite years of requests, while Amazon thread networks cannot merge with other thread border routers.”
Products
by IKEA
“IKEA's six dollar Billreza buttons represent mass market thread adoption but expose system failures.”
More from The Vergecast
We summarize every new episode. Want them in your inbox?
Your biggest questions from Apple's WWDC
How Steve Jobs became Steve Jobs
Siri AI, Screen Time, and the rest of WWDC 2026: The Vergecast Livestream
This is your laptop... on AI
Microsoft's plan to catch up in AI
Similar Episodes
Related episodes from other podcasts
All-In with Chamath, Jason, Sacks & Friedberg
May 15
Trump-Xi Summit, Benioff: "Not My First SaaSpocalypse," OpenAI vs Apple, Multi-Sensory AI, El Niño
20VC (20 Minute VC)
Feb 12
20VC: Anthropic's Superbowl Ad: Who Won - Who Lost | Harvey Raises $200M at $11BN Valuation | Sierra Hits $150M in ARR: Is Customer Support Too Crowded
Snacks Daily
Feb 6
🐰 “El Supertazón” — Bad Bunny’s business. Anthropic’s book destruction. Chipotle’s $100K rule. +Football Birkin Bags
The AI Breakdown
Jun 1
The AI Token Shortage Begins [AI Monthly Recap]
20VC (20 Minute VC)
Jun 1
20VC: Mercor CEO on Why Application Layer Companies Have No Defensibility, The Model is the Product | Token Spend Will Exceed Headcount Spend in 5 Years | The True Cost of Hiring AI Researchers in the Valley Today with Brendan Foody
Explore Related Topics
This podcast is featured in Best Tech Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into The Vergecast.
Every Monday, we deliver AI summaries of the latest episodes from The Vergecast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime