Hippocratic AI's Munjal Shah on How AI Agents Are Expanding Healthcare Capacity - Ep. 262
Episode
21 min
Read time
2 min
Topics
Health & Wellness, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
- ✓Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
- ✓Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
- ✓Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.
What It Covers
Munjal Shah explains how Hippocratic AI deploys safety-focused healthcare agents that have completed 1.85 million patient calls, achieving 8.95/10 satisfaction ratings while addressing clinical staffing shortages through inference-optimized architecture and rigorous output testing protocols.
Key Questions Answered
- •Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
- •Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
- •Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
- •Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.
Notable Moment
Shah reveals that 30% of patients initially resist AI healthcare, but after agents explain human callback delays and demonstrate empathetic listening, only 15% ultimately refuse. Patients appreciate undivided attention—something increasingly rare in modern interactions—leading to sustained engagement within 30-60 seconds.
You just read a 3-minute summary of a 18-minute episode.
Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from NVIDIA AI Podcast
How Dassault Systèmes Is Building AI That Understands Physics - Ep. 296
Apr 29 · 23 min
Morning Brew Daily
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
Apr 30
More from NVIDIA AI Podcast
One Brain, Any Robot: Skild AI's Skild Brain Explained - Ep. 295
Apr 22 · 29 min
a16z Podcast
Workday’s Last Workday? AI and the Future of Enterprise Software
Apr 30
More from NVIDIA AI Podcast
We summarize every new episode. Want them in your inbox?
How Dassault Systèmes Is Building AI That Understands Physics - Ep. 296
One Brain, Any Robot: Skild AI's Skild Brain Explained - Ep. 295
How AI Will Change Quantum Computing - Ep. 294
Building AI Factories: How Red Hat and NVIDIA Turn Enterprise Data Into Intelligence - Ep. 293
Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292
Similar Episodes
Related episodes from other podcasts
Morning Brew Daily
Apr 30
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
a16z Podcast
Apr 30
Workday’s Last Workday? AI and the Future of Enterprise Software
Masters of Scale
Apr 30
How Poppi’s founders built a new soda brand worth $2 billion
Snacks Daily
Apr 30
🦸♀️ “MAMA Stocks” — Zuck’s Ad/AI machine. Hilary Duff’s anti-Ozempic bet. Bill Ackman’s Influencer IPO. +Refresher surge
The Mel Robbins Podcast
Apr 30
Eat This to Live Longer, Stay Young, and Transform Your Health
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into NVIDIA AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime