Skip to main content
NVIDIA AI Podcast

Hippocratic AI's Munjal Shah on How AI Agents Are Expanding Healthcare Capacity - Ep. 262

21 min episode · 2 min read
·

Episode

21 min

Read time

2 min

Topics

Health & Wellness, Artificial Intelligence

AI-Generated Summary

Key Takeaways

  • Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
  • Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
  • Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
  • Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.

What It Covers

Munjal Shah explains how Hippocratic AI deploys safety-focused healthcare agents that have completed 1.85 million patient calls, achieving 8.95/10 satisfaction ratings while addressing clinical staffing shortages through inference-optimized architecture and rigorous output testing protocols.

Key Questions Answered

  • Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
  • Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
  • Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
  • Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.

Notable Moment

Shah reveals that 30% of patients initially resist AI healthcare, but after agents explain human callback delays and demonstrate empathetic listening, only 15% ultimately refuse. Patients appreciate undivided attention—something increasingly rare in modern interactions—leading to sustained engagement within 30-60 seconds.

Know someone who'd find this useful?

You just read a 3-minute summary of a 18-minute episode.

Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from NVIDIA AI Podcast

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into NVIDIA AI Podcast.

Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime