How Intercom 2x’d their engineering velocity in 9 months with Claude Code | Brian Scanlan
Episode
78 min
Read time
3 min
Topics
Software Development
AI-Generated Summary
Key Takeaways
- ✓Velocity Measurement: Use merged pull requests per R&D head as a leading indicator of AI adoption effectiveness. Intercom tracked this metric from baseline through a 9-month Claude Code rollout, achieving 2x throughput. The raw PR count grew even higher since headcount also increased during this period. A crude metric beats no metric when building organizational accountability around AI tooling adoption.
- ✓Skills Distribution via IT Systems: Deploy Claude Code plugins through internal IT infrastructure rather than relying on Claude's native plugin sync mechanism, which proved unreliable across hundreds of laptops. Pushing skill files directly to disk via IT management tools eliminates version drift, reduces debugging overhead, and ensures every engineer runs identical, current tooling without manual intervention or update failures.
- ✓LLM Judges for Quality Regression Detection: After Claude Code began generating low-quality PR descriptions (summarizing code rather than intent), Intercom built an LLM judge to evaluate months of historical PR description data. The judge confirmed a downward trend, prompting a mandatory "create PR" skill enforced via hooks that block the GitHub CLI. Post-intervention, the LLM judge confirmed quality returned to above-baseline levels.
- ✓Session Telemetry for Org-Level Diagnostics: Collect Claude Code session JSON files, anonymize them, upload to S3, and build user-level dashboards showing session efficiency percentiles, skill invocation patterns, and dropout rates. This surfaces systemic problems—like an MCP never triggering correctly—that are invisible without aggregate data. Honeycomb works well for real-time skill invocation tracking across the engineering organization.
- ✓Self-Improving Skills via Feedback Loops: Build skills that update themselves when they encounter novel solutions. Intercom's flaky spec skill fixes a test, documents the new pattern back into the skill file, then fans out to find all similar failing tests. This compounds from roughly 1x performance at launch to 10x or higher as the skill accumulates domain-specific patterns, without requiring ongoing human maintenance.
What It Covers
Brian Scanlan, Senior Principal Engineer at Intercom, details how the company doubled engineering throughput (measured in merged PRs per R&D head) over nine months using Claude Code. He demonstrates the internal skills repository, telemetry infrastructure, session analysis tooling, and cultural frameworks that enabled a 150+ person R&D organization to ship at 2x velocity while maintaining or improving code quality.
Key Questions Answered
- •Velocity Measurement: Use merged pull requests per R&D head as a leading indicator of AI adoption effectiveness. Intercom tracked this metric from baseline through a 9-month Claude Code rollout, achieving 2x throughput. The raw PR count grew even higher since headcount also increased during this period. A crude metric beats no metric when building organizational accountability around AI tooling adoption.
- •Skills Distribution via IT Systems: Deploy Claude Code plugins through internal IT infrastructure rather than relying on Claude's native plugin sync mechanism, which proved unreliable across hundreds of laptops. Pushing skill files directly to disk via IT management tools eliminates version drift, reduces debugging overhead, and ensures every engineer runs identical, current tooling without manual intervention or update failures.
- •LLM Judges for Quality Regression Detection: After Claude Code began generating low-quality PR descriptions (summarizing code rather than intent), Intercom built an LLM judge to evaluate months of historical PR description data. The judge confirmed a downward trend, prompting a mandatory "create PR" skill enforced via hooks that block the GitHub CLI. Post-intervention, the LLM judge confirmed quality returned to above-baseline levels.
- •Session Telemetry for Org-Level Diagnostics: Collect Claude Code session JSON files, anonymize them, upload to S3, and build user-level dashboards showing session efficiency percentiles, skill invocation patterns, and dropout rates. This surfaces systemic problems—like an MCP never triggering correctly—that are invisible without aggregate data. Honeycomb works well for real-time skill invocation tracking across the engineering organization.
- •Self-Improving Skills via Feedback Loops: Build skills that update themselves when they encounter novel solutions. Intercom's flaky spec skill fixes a test, documents the new pattern back into the skill file, then fans out to find all similar failing tests. This compounds from roughly 1x performance at launch to 10x or higher as the skill accumulates domain-specific patterns, without requiring ongoing human maintenance.
- •Tech Debt as AI Onboarding Strategy: When introducing AI coding tools to an engineering team, direct engineers to spend one month fixing everything they hate about the codebase. The combination of low-friction execution and high emotional payoff builds AI tool fluency while delivering measurable quality improvements. Intercom migrated an entire Go microservice to Ruby in a single Claude Code session—previously a multi-month roadmap item requiring organizational consensus.
Notable Moment
Scanlan described how Intercom's CI system became ten times more expensive almost overnight once Claude Code adoption accelerated PR volume. After fixing those infrastructure bottlenecks, code review became the new constraint. The implication: AI coding tools will sequentially expose every weak point in a delivery pipeline, requiring teams to fix bottlenecks they previously never stressed.
You just read a 3-minute summary of a 75-minute episode.
Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from How I AI
What Claude Design is actually good for (and why Figma isn’t dead, yet)
Apr 22 · 27 min
ZOE Science & Nutrition
The 5 best foods to fight cancer growth and lower your risk of death | Dr William Li
Apr 23
More from How I AI
Claude Cowork 101: How to automate your workday without touching code | JJ Englert (Tenex)
Apr 13 · 50 min
Masters of Scale
The art of the steal: Serial founder Eric Ryan on finding inspiration
Apr 23
More from How I AI
We summarize every new episode. Want them in your inbox?
What Claude Design is actually good for (and why Figma isn’t dead, yet)
Claude Cowork 101: How to automate your workday without touching code | JJ Englert (Tenex)
I built a custom Slack inbox. It was easier than you’d think. | Yash Tekriwal (Clay)
I gave Claude Code our entire codebase. Our customers noticed. | Al Chen (Galileo)
How to turn Claude Code into your personal life operating system | Hilary Gridley
Similar Episodes
Related episodes from other podcasts
ZOE Science & Nutrition
Apr 23
The 5 best foods to fight cancer growth and lower your risk of death | Dr William Li
Masters of Scale
Apr 23
The art of the steal: Serial founder Eric Ryan on finding inspiration
Software Engineering Daily
Apr 23
Hype and Reality of the AI Coding Shift
Everything Everywhere Daily
Apr 23
Mythical Creatures: Unicorns, Dragons, and Mermaids
Odd Lots
Apr 23
Google's Liz Reid on Who Will Own Search in a World of AI
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Software Engineering Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into How I AI.
Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime