Hippocratic AI's Munjal Shah on How AI Agents Are Expanding Healthcare Capacity - Ep. 262
Episode
21 min
Read time
2 min
Topics
Productivity, Health & Wellness, Sales & Revenue
AI-Generated Summary
Key Takeaways
- ✓Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
- ✓Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
- ✓Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
- ✓Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.
What It Covers
Munjal Shah explains how Hippocratic AI deploys safety-focused healthcare agents that have completed 1.85 million patient calls, achieving 8.95/10 satisfaction ratings while addressing clinical staffing shortages through inference-optimized architecture and rigorous output testing protocols.
Key Questions Answered
- •Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
- •Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
- •Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
- •Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.
Notable Moment
Shah reveals that 30% of patients initially resist AI healthcare, but after agents explain human callback delays and demonstrate empathetic listening, only 15% ultimately refuse. Patients appreciate undivided attention—something increasingly rare in modern interactions—leading to sustained engagement within 30-60 seconds.
You just read a 3-minute summary of a 18-minute episode.
Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from NVIDIA AI Podcast
How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301
Jun 10 · 21 min
Odd Lots
'The Assassin' Fahmi Quadir on How to Survive as a Short-Seller
May 22
More from NVIDIA AI Podcast
Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300
May 27 · 29 min
In Good Company with Nicolai Tangen
HIGHLIGHTS: Mala Gaonkar - Founder of SurgoCap Partners
Jan 23
More from NVIDIA AI Podcast
We summarize every new episode. Want them in your inbox?
How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301
Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300
Inside AI Tokenomics: How to Profitably Turn Tokens Into Business Value | NVIDIA AI Podcast Ep. 299
Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298
Harrison Chase of LangChain on Deep Agents, LangSmith, and Earning Trust | NVIDIA AI Podcast Ep. 297
Similar Episodes
Related episodes from other podcasts
Odd Lots
May 22
'The Assassin' Fahmi Quadir on How to Survive as a Short-Seller
In Good Company with Nicolai Tangen
Jan 23
HIGHLIGHTS: Mala Gaonkar - Founder of SurgoCap Partners
In Good Company with Nicolai Tangen
Jan 21
Mala Gaonkar: Building SurgoCap, Identifying Great Businesses and Learning from Mistakes
Invest Like the Best with Patrick O'Shaughnessy
Jun 3
Dara Khosrowshahi - Uber's Bet on AVs, AI, and Building a Super-App - [Invest Like the Best, EP.476]
Huberman Lab
May 7
Essentials: Compulsive Behaviors & Deep Brain Stimulation | Dr. Casey Halpern
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into NVIDIA AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime