Skip to main content
How I AI

How Intercom 2x’d their engineering velocity in 9 months with Claude Code | Brian Scanlan

78 min episode · 3 min read
·

Episode

78 min

Read time

3 min

Topics

Software Development

AI-Generated Summary

Key Takeaways

  • Velocity Measurement: Use merged pull requests per R&D head as a leading indicator of AI adoption effectiveness. Intercom tracked this metric from baseline through a 9-month Claude Code rollout, achieving 2x throughput. The raw PR count grew even higher since headcount also increased during this period. A crude metric beats no metric when building organizational accountability around AI tooling adoption.
  • Skills Distribution via IT Systems: Deploy Claude Code plugins through internal IT infrastructure rather than relying on Claude's native plugin sync mechanism, which proved unreliable across hundreds of laptops. Pushing skill files directly to disk via IT management tools eliminates version drift, reduces debugging overhead, and ensures every engineer runs identical, current tooling without manual intervention or update failures.
  • LLM Judges for Quality Regression Detection: After Claude Code began generating low-quality PR descriptions (summarizing code rather than intent), Intercom built an LLM judge to evaluate months of historical PR description data. The judge confirmed a downward trend, prompting a mandatory "create PR" skill enforced via hooks that block the GitHub CLI. Post-intervention, the LLM judge confirmed quality returned to above-baseline levels.
  • Session Telemetry for Org-Level Diagnostics: Collect Claude Code session JSON files, anonymize them, upload to S3, and build user-level dashboards showing session efficiency percentiles, skill invocation patterns, and dropout rates. This surfaces systemic problems—like an MCP never triggering correctly—that are invisible without aggregate data. Honeycomb works well for real-time skill invocation tracking across the engineering organization.
  • Self-Improving Skills via Feedback Loops: Build skills that update themselves when they encounter novel solutions. Intercom's flaky spec skill fixes a test, documents the new pattern back into the skill file, then fans out to find all similar failing tests. This compounds from roughly 1x performance at launch to 10x or higher as the skill accumulates domain-specific patterns, without requiring ongoing human maintenance.

What It Covers

Brian Scanlan, Senior Principal Engineer at Intercom, details how the company doubled engineering throughput (measured in merged PRs per R&D head) over nine months using Claude Code. He demonstrates the internal skills repository, telemetry infrastructure, session analysis tooling, and cultural frameworks that enabled a 150+ person R&D organization to ship at 2x velocity while maintaining or improving code quality.

Key Questions Answered

  • Velocity Measurement: Use merged pull requests per R&D head as a leading indicator of AI adoption effectiveness. Intercom tracked this metric from baseline through a 9-month Claude Code rollout, achieving 2x throughput. The raw PR count grew even higher since headcount also increased during this period. A crude metric beats no metric when building organizational accountability around AI tooling adoption.
  • Skills Distribution via IT Systems: Deploy Claude Code plugins through internal IT infrastructure rather than relying on Claude's native plugin sync mechanism, which proved unreliable across hundreds of laptops. Pushing skill files directly to disk via IT management tools eliminates version drift, reduces debugging overhead, and ensures every engineer runs identical, current tooling without manual intervention or update failures.
  • LLM Judges for Quality Regression Detection: After Claude Code began generating low-quality PR descriptions (summarizing code rather than intent), Intercom built an LLM judge to evaluate months of historical PR description data. The judge confirmed a downward trend, prompting a mandatory "create PR" skill enforced via hooks that block the GitHub CLI. Post-intervention, the LLM judge confirmed quality returned to above-baseline levels.
  • Session Telemetry for Org-Level Diagnostics: Collect Claude Code session JSON files, anonymize them, upload to S3, and build user-level dashboards showing session efficiency percentiles, skill invocation patterns, and dropout rates. This surfaces systemic problems—like an MCP never triggering correctly—that are invisible without aggregate data. Honeycomb works well for real-time skill invocation tracking across the engineering organization.
  • Self-Improving Skills via Feedback Loops: Build skills that update themselves when they encounter novel solutions. Intercom's flaky spec skill fixes a test, documents the new pattern back into the skill file, then fans out to find all similar failing tests. This compounds from roughly 1x performance at launch to 10x or higher as the skill accumulates domain-specific patterns, without requiring ongoing human maintenance.
  • Tech Debt as AI Onboarding Strategy: When introducing AI coding tools to an engineering team, direct engineers to spend one month fixing everything they hate about the codebase. The combination of low-friction execution and high emotional payoff builds AI tool fluency while delivering measurable quality improvements. Intercom migrated an entire Go microservice to Ruby in a single Claude Code session—previously a multi-month roadmap item requiring organizational consensus.

Notable Moment

Scanlan described how Intercom's CI system became ten times more expensive almost overnight once Claude Code adoption accelerated PR volume. After fixing those infrastructure bottlenecks, code review became the new constraint. The implication: AI coding tools will sequentially expose every weak point in a delivery pipeline, requiring teams to fix bottlenecks they previously never stressed.

Know someone who'd find this useful?

You just read a 3-minute summary of a 75-minute episode.

Get How I AI summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from How I AI

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Software Engineering Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into How I AI.

Every Monday, we deliver AI summaries of the latest episodes from How I AI and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime