How Intercom 2x’d their engineering velocity in 9 months with Claude Code | Brian Scanlan
Episode
78 min
Read time
3 min
Topics
Productivity, Leadership, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Velocity Measurement: Use merged pull requests per R&D head as a leading indicator of AI adoption effectiveness. Intercom tracked this metric from baseline through a 9-month Claude Code rollout, achieving 2x throughput. The raw PR count grew even higher since headcount also increased during this period. A crude metric beats no metric when building organizational accountability around AI tooling adoption.
- ✓Skills Distribution via IT Systems: Deploy Claude Code plugins through internal IT infrastructure rather than relying on Claude's native plugin sync mechanism, which proved unreliable across hundreds of laptops. Pushing skill files directly to disk via IT management tools eliminates version drift, reduces debugging overhead, and ensures every engineer runs identical, current tooling without manual intervention or update failures.
- ✓LLM Judges for Quality Regression Detection: After Claude Code began generating low-quality PR descriptions (summarizing code rather than intent), Intercom built an LLM judge to evaluate months of historical PR description data. The judge confirmed a downward trend, prompting a mandatory "create PR" skill enforced via hooks that block the GitHub CLI. Post-intervention, the LLM judge confirmed quality returned to above-baseline levels.
- ✓Session Telemetry for Org-Level Diagnostics: Collect Claude Code session JSON files, anonymize them, upload to S3, and build user-level dashboards showing session efficiency percentiles, skill invocation patterns, and dropout rates. This surfaces systemic problems—like an MCP never triggering correctly—that are invisible without aggregate data. Honeycomb works well for real-time skill invocation tracking across the engineering organization.
- ✓Self-Improving Skills via Feedback Loops: Build skills that update themselves when they encounter novel solutions. Intercom's flaky spec skill fixes a test, documents the new pattern back into the skill file, then fans out to find all similar failing tests. This compounds from roughly 1x performance at launch to 10x or higher as the skill accumulates domain-specific patterns, without requiring ongoing human maintenance.
What It Covers
Brian Scanlan, Senior Principal Engineer at Intercom, details how the company doubled engineering throughput (measured in merged PRs per R&D head) over nine months using Claude Code. He demonstrates the internal skills repository, telemetry infrastructure, session analysis tooling, and cultural frameworks that enabled a 150+ person R&D organization to ship at 2x velocity while maintaining or improving code quality.
Key Questions Answered
- •Velocity Measurement: Use merged pull requests per R&D head as a leading indicator of AI adoption effectiveness. Intercom tracked this metric from baseline through a 9-month Claude Code rollout, achieving 2x throughput. The raw PR count grew even higher since headcount also increased during this period. A crude metric beats no metric when building organizational accountability around AI tooling adoption.
- •Skills Distribution via IT Systems: Deploy Claude Code plugins through internal IT infrastructure rather than relying on Claude's native plugin sync mechanism, which proved unreliable across hundreds of laptops. Pushing skill files directly to disk via IT management tools eliminates version drift, reduces debugging overhead, and ensures every engineer runs identical, current tooling without manual intervention or update failures.
- •LLM Judges for Quality Regression Detection: After Claude Code began generating low-quality PR descriptions (summarizing code rather than intent), Intercom built an LLM judge to evaluate months of historical PR description data. The judge confirmed a downward trend, prompting a mandatory "create PR" skill enforced via hooks that block the GitHub CLI. Post-intervention, the LLM judge confirmed quality returned to above-baseline levels.
- •Session Telemetry for Org-Level Diagnostics: Collect Claude Code session JSON files, anonymize them, upload to S3, and build user-level dashboards showing session efficiency percentiles, skill invocation patterns, and dropout rates. This surfaces systemic problems—like an MCP never triggering correctly—that are invisible without aggregate data. Honeycomb works well for real-time skill invocation tracking across the engineering organization.
- •Self-Improving Skills via Feedback Loops: Build skills that update themselves when they encounter novel solutions. Intercom's flaky spec skill fixes a test, documents the new pattern back into the skill file, then fans out to find all similar failing tests. This compounds from roughly 1x performance at launch to 10x or higher as the skill accumulates domain-specific patterns, without requiring ongoing human maintenance.
- •Tech Debt as AI Onboarding Strategy: When introducing AI coding tools to an engineering team, direct engineers to spend one month fixing everything they hate about the codebase. The combination of low-friction execution and high emotional payoff builds AI tool fluency while delivering measurable quality improvements. Intercom migrated an entire Go microservice to Ruby in a single Claude Code session—previously a multi-month roadmap item requiring organizational consensus.
Notable Moment
Scanlan described how Intercom's CI system became ten times more expensive almost overnight once Claude Code adoption accelerated PR volume. After fixing those infrastructure bottlenecks, code review became the new constraint. The implication: AI coding tools will sequentially expose every weak point in a delivery pipeline, requiring teams to fix bottlenecks they previously never stressed.
You just read a 3-minute summary of a 75-minute episode.
Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from How I AI
Gemini Omni: Clone yourself with AI in under 15 minutes
Jun 3 · 20 min
SaaStr Podcast
SaaStr 839: Why Most SaaS Companies Will Fail at AI (And How to Avoid It) with Intercom's CPO
Jan 28
More from How I AI
Building an iPhone app with zero technical skills | Bryce Rattner Keithley
Jun 1 · 46 min
20VC (20 Minute VC)
20Product: Inside Legora's Tech Stack: Why Token Maxing is Failing Enterprise Startups with Jacob Lauritzen, CTO @ Legora
Jun 6
More from How I AI
We summarize every new episode. Want them in your inbox?
Gemini Omni: Clone yourself with AI in under 15 minutes
Building an iPhone app with zero technical skills | Bryce Rattner Keithley
Claude Opus 4.8 is here. Is it as good as they say?
The Codex feature that works while you sleep
How the engineer behind Claude Cowork actually uses Claude | Felix Rieseberg (Anthropic)
Similar Episodes
Related episodes from other podcasts
SaaStr Podcast
Jan 28
SaaStr 839: Why Most SaaS Companies Will Fail at AI (And How to Avoid It) with Intercom's CPO
20VC (20 Minute VC)
Jun 6
20Product: Inside Legora's Tech Stack: Why Token Maxing is Failing Enterprise Startups with Jacob Lauritzen, CTO @ Legora
Software Engineering Daily
May 28
Autonomous Drone Delivery at Scale
NVIDIA AI Podcast
May 13
Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298
Lenny's Podcast
Apr 5
Head of Growth (Anthropic): “Claude is growing itself at this point” | Amol Avasare
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into How I AI.
Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime