What are the key takeaways from this NVIDIA AI Podcast episode?

Key insights include: **Constellation Architecture:** Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.; **Output Testing Protocol:** Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.; **Inference Latency Requirements:** Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.

What did Munjal Shah discuss on NVIDIA AI Podcast?

Munjal Shah explains how Hippocratic AI deploys safety-focused healthcare agents that have completed 1.85 million patient calls, achieving 8.95/10 satisfaction ratings while addressing clinical staffing shortages through inference-optimized architecture and rigorous output testing protocols. Key topics include: **Constellation Architecture:** Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.; **Output Testing Protocol:** Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources..

How long is this episode of NVIDIA AI Podcast?

This episode is 21 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

NVIDIA AI Podcast

Hippocratic AI's Munjal Shah on How AI Agents Are Expanding Healthcare Capacity - Ep. 262

June 25, 2025

21 min episode · 2 min read

Munjal Shah

Episode

21 min

Read time

2 min

Topics

Productivity, Health & Wellness, Sales & Revenue

AI-Generated Summary

Published Dec 25, 2025

Key Takeaways

✓Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
✓Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
✓Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
✓Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.

What It Covers

Munjal Shah explains how Hippocratic AI deploys safety-focused healthcare agents that have completed 1.85 million patient calls, achieving 8.95/10 satisfaction ratings while addressing clinical staffing shortages through inference-optimized architecture and rigorous output testing protocols.

Key Questions Answered

•Constellation Architecture: Hippocratic runs 22 models simultaneously per conversation—one 400B parameter model handles dialogue while 19 supervising models check safety in real-time, plus two deep-thinking models perform 30-60 second verification checks, requiring 128 NVIDIA H100 GPUs just to load into RAM before supporting multiple conversations.
•Output Testing Protocol: Rather than validating training data, Hippocratic hired 6,000 licensed US clinicians to conduct 309,000 test calls, marking every error before deployment. This use-case-specific testing approach costs double-digit millions but ensures safety by verifying actual outputs, not architectural assumptions or training sources.
•Inference Latency Requirements: Voice-based healthcare agents need 1.5-2 second end-to-end response times, requiring optimization for latency rather than cost-per-token or throughput. This differs fundamentally from text-based search applications where 20-30 second delays remain acceptable for deeper reasoning capabilities.
•Agent App Store Model: Clinicians submit custom scripts based on specialized expertise, receive validation and safety testing from Hippocratic, then earn revenue share when their agents deploy. A concussion clinic nurse can scale 20 years of knowledge to millions of patients nationwide within four minutes of prompt creation.

Notable Moment

Shah reveals that 30% of patients initially resist AI healthcare, but after agents explain human callback delays and demonstrate empathetic listening, only 15% ultimately refuse. Patients appreciate undivided attention—something increasingly rare in modern interactions—leading to sustained engagement within 30-60 seconds.

Know someone who'd find this useful?

You just read a 3-minute summary of a 18-minute episode.

Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

Odd Lots

May 22

Explore Related Topics

⚡Productivity 🏃Health & Wellness 🤝Sales & Revenue

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into NVIDIA AI Podcast.

Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Hippocratic AI's Munjal Shah on How AI Agents Are Expanding Healthcare Capacity - Ep. 262

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Inside Instacart's AI-Powered Smart Shopping Cart | NVIDIA AI Podcast Ep. 302

'The Assassin' Fahmi Quadir on How to Survive as a Short-Seller

How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301

HIGHLIGHTS: Mala Gaonkar - Founder of SurgoCap Partners

More from NVIDIA AI Podcast

Inside Instacart's AI-Powered Smart Shopping Cart | NVIDIA AI Podcast Ep. 302

How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301

Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300

Inside AI Tokenomics: How to Profitably Turn Tokens Into Business Value | NVIDIA AI Podcast Ep. 299

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Similar Episodes

'The Assassin' Fahmi Quadir on How to Survive as a Short-Seller

HIGHLIGHTS: Mala Gaonkar - Founder of SurgoCap Partners

Mala Gaonkar: Building SurgoCap, Identifying Great Businesses and Learning from Mistakes

The Stoplight System with Tykr founder Sean Tepper

The Creator of Claude Code on The Hottest Piece of Software in the World

Explore Related Topics

You're clearly into NVIDIA AI Podcast.