Skip to main content
The TWIML AI Podcast

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

54 min episode · 2 min read
·

Episode

54 min

Read time

2 min

AI-Generated Summary

Key Takeaways

  • Multi-agent trigger criteria: Deploy multi-agent architecture only when a problem contains multiple distinct user intents that cannot be resolved by a single deterministic model. Capital One's Chat Concierge required separate agents for intent disambiguation, planning, governance validation, response accuracy checking, and final response formatting — each with a narrowly scoped task.
  • Risk-first platform layering: Separate agent governance into two distinct layers — platform-level enterprise policies covering cyber, compliance, and guardrails that apply automatically at runtime, and domain-specific policies that individual teams layer on top. This split lets developers focus on agent design while the platform enforces mandatory regulatory boundaries without manual configuration per deployment.
  • Latency as a product feature: Treat end-to-end latency as a first-class product requirement, not a non-functional afterthought. In multi-agent systems, latency must be measured across every agent boundary, tool invocation, and model call simultaneously. Capital One uses smaller specialized fine-tuned models via teacher-student distillation to hit latency targets while maintaining personalization quality.
  • Closed-loop observability design: Instrument agentic systems to capture production failure signals and route them back into the experimentation environment for prompt tuning, model fine-tuning, retrieval adjustment, or context management updates. Design this feedback pipeline before deployment, not after, because production telemetry is where the largest performance gains originate in agentic systems.
  • Beachhead use case selection: Choose the first production agentic deployment from a high-surface-area, low-risk scenario to safely observe real failure modes at scale. Capital One selected an auto dealership customer experience rather than a core banking workflow, generating architectural patterns and observability baselines that informed the broader enterprise platform strategy.

What It Covers

Rashmi Shetty, Senior Director of Enterprise Generative AI Platform at Capital One, explains how the company built and deployed Chat Concierge, a multi-agent car-buying system, and outlines the platform strategy enabling developers to build governed agentic systems at scale across the enterprise.

Key Questions Answered

  • Multi-agent trigger criteria: Deploy multi-agent architecture only when a problem contains multiple distinct user intents that cannot be resolved by a single deterministic model. Capital One's Chat Concierge required separate agents for intent disambiguation, planning, governance validation, response accuracy checking, and final response formatting — each with a narrowly scoped task.
  • Risk-first platform layering: Separate agent governance into two distinct layers — platform-level enterprise policies covering cyber, compliance, and guardrails that apply automatically at runtime, and domain-specific policies that individual teams layer on top. This split lets developers focus on agent design while the platform enforces mandatory regulatory boundaries without manual configuration per deployment.
  • Latency as a product feature: Treat end-to-end latency as a first-class product requirement, not a non-functional afterthought. In multi-agent systems, latency must be measured across every agent boundary, tool invocation, and model call simultaneously. Capital One uses smaller specialized fine-tuned models via teacher-student distillation to hit latency targets while maintaining personalization quality.
  • Closed-loop observability design: Instrument agentic systems to capture production failure signals and route them back into the experimentation environment for prompt tuning, model fine-tuning, retrieval adjustment, or context management updates. Design this feedback pipeline before deployment, not after, because production telemetry is where the largest performance gains originate in agentic systems.
  • Beachhead use case selection: Choose the first production agentic deployment from a high-surface-area, low-risk scenario to safely observe real failure modes at scale. Capital One selected an auto dealership customer experience rather than a core banking workflow, generating architectural patterns and observability baselines that informed the broader enterprise platform strategy.

Notable Moment

Shetty reframes Capital One's competitive AI advantage not as model sophistication but as data infrastructure built over a decade. The argument is that specialized fine-tuned models only outperform general ones when enterprise-grade data pipelines already exist — making prior data investment the actual prerequisite for agentic success.

Know someone who'd find this useful?

You just read a 3-minute summary of a 51-minute episode.

Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from The TWIML AI Podcast

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into The TWIML AI Podcast.

Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime