Millions of books died so Claude could live
Episode
88 min
Read time
3 min
Topics
Books & Authors
AI-Generated Summary
Key Takeaways
- ✓AI Training Data Acquisition: Anthropic's Project Panama used hydraulic cutting machines to destructively scan physical books after initially downloading pirated shadow libraries like LibGen. The company hired Tom Turvey from Google Books, purchased hundreds of thousands of used books from warehouses like Better World Books at bulk prices, sliced off spines, and rapidly scanned pages to digitize content for Claude training.
- ✓Books as Quality Training Material: AI companies prioritize books over other content sources because published works provide higher quality, vetted material with coherent sentence structure and fact-checking. Anthropic viewed books as a competitive advantage to catch up with larger rivals like OpenAI and Google, with evidence suggesting Claude's reputation as the best writing chatbot may stem from this book-heavy training approach.
- ✓Legal Fair Use Paradox: Two judges ruled AI model training on books constitutes fair use, but companies face liability for how they acquired books initially. Anthropic settled for one point five billion dollars over books they scanned but never used in commercial models, while the actual training process was deemed legally acceptable. This creates a counterintuitive situation where illegal acquisition precedes legal usage.
- ✓Theatrical Revenue Decline Drivers: Civic Science surveyed two thousand moviegoers and found lack of interest in available movie types ranked as the top reason people avoid theaters, with cost ranking second. Average moviegoers now see fewer films monthly than in the early nineteen nineties, while supply of theatrical releases has steadily decreased, creating uncertainty about whether more films would increase attendance.
- ✓Nostalgia Screening Strategy: Studios could fill theatrical gaps by reprinting beloved films like Mean Girls or Nightmare Before Christmas, which perform well during limited runs with minimal reprint costs. This approach mirrors streaming's use of catalog content like Friends to maintain subscriber lifetime value, allowing exhibitors to pay operating costs while studios reserve expensive new productions for proven blockbuster opportunities.
What It Covers
The Vergecast examines how Anthropic and other AI companies train models using millions of books through Project Panama, involving destructive scanning and shadow libraries. The episode explores Netflix's theatrical strategy amid the Warner Brothers Discovery acquisition, questioning whether movie theaters can survive through nostalgia screenings and alternative programming rather than traditional releases.
Key Questions Answered
- •AI Training Data Acquisition: Anthropic's Project Panama used hydraulic cutting machines to destructively scan physical books after initially downloading pirated shadow libraries like LibGen. The company hired Tom Turvey from Google Books, purchased hundreds of thousands of used books from warehouses like Better World Books at bulk prices, sliced off spines, and rapidly scanned pages to digitize content for Claude training.
- •Books as Quality Training Material: AI companies prioritize books over other content sources because published works provide higher quality, vetted material with coherent sentence structure and fact-checking. Anthropic viewed books as a competitive advantage to catch up with larger rivals like OpenAI and Google, with evidence suggesting Claude's reputation as the best writing chatbot may stem from this book-heavy training approach.
- •Legal Fair Use Paradox: Two judges ruled AI model training on books constitutes fair use, but companies face liability for how they acquired books initially. Anthropic settled for one point five billion dollars over books they scanned but never used in commercial models, while the actual training process was deemed legally acceptable. This creates a counterintuitive situation where illegal acquisition precedes legal usage.
- •Theatrical Revenue Decline Drivers: Civic Science surveyed two thousand moviegoers and found lack of interest in available movie types ranked as the top reason people avoid theaters, with cost ranking second. Average moviegoers now see fewer films monthly than in the early nineteen nineties, while supply of theatrical releases has steadily decreased, creating uncertainty about whether more films would increase attendance.
- •Nostalgia Screening Strategy: Studios could fill theatrical gaps by reprinting beloved films like Mean Girls or Nightmare Before Christmas, which perform well during limited runs with minimal reprint costs. This approach mirrors streaming's use of catalog content like Friends to maintain subscriber lifetime value, allowing exhibitors to pay operating costs while studios reserve expensive new productions for proven blockbuster opportunities.
- •IKEA Smart Home Thread Problems: IKEA's six dollar Billreza buttons represent mass market thread adoption but expose system failures. Google Home still refuses to support matter buttons despite years of requests, while Amazon thread networks cannot merge with other thread border routers. Initial pairing issues and network disconnections plague IKEA's first wave of thread devices, requiring troubleshooting through multiple platforms.
Notable Moment
Will Oremus discovered internal documents showing an Anthropic executive previously downloaded the entire LibGen pirated book library while at OpenAI, then repeated the same process after cofounding Anthropic. The documents included browser screenshots with torrent sites open and LibGen partially downloaded, demonstrating how AI companies systematically used piracy as their starting point before developing physical book scanning operations.
You just read a 3-minute summary of a 85-minute episode.
Get The Vergecast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The Vergecast
AirPods, Touch Bars, and the rest of Tim Cook's legacy
Apr 24 · 98 min
Masters of Scale
Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers
Apr 25
More from The Vergecast
The Vergecast Vergecast, 2026 edition
Apr 21 · 84 min
The Futur
Why Process is Better Than AI w/ Scott Clum | Ep 430
Apr 25
More from The Vergecast
We summarize every new episode. Want them in your inbox?
Similar Episodes
Related episodes from other podcasts
Masters of Scale
Apr 25
Possible: Netflix co-founder Reed Hastings: stories, schools, superpowers
The Futur
Apr 25
Why Process is Better Than AI w/ Scott Clum | Ep 430
20VC (20 Minute VC)
Apr 25
20Product: Replit CEO on Why Coding Models Are Plateauing | Why the SaaS Apocalypse is Justified: Will Incumbents Be Replaced? | Why IDEs Are Dead and Do PMs Survive the Next 3-5 Years with Amjad Masad
This Week in Startups
Apr 25
The Defense Tech Startup YC Kicked Out of a Meeting is Now Arming America | E2280
Marketplace
Apr 24
When does AI become a spending suck?
Explore Related Topics
This podcast is featured in Best Tech Podcasts (2026) — ranked and reviewed with AI summaries.
You're clearly into The Vergecast.
Every Monday, we deliver AI summaries of the latest episodes from The Vergecast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime