Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292
Episode
32 min
Read time
2 min
Topics
Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Inference vs. Training Power Split: Over the lifetime of an AI model, roughly 80–90% of total compute consumption occurs during inference, not training. Organizations planning energy infrastructure should weight capacity planning heavily toward inference workloads, not just the headline-grabbing training buildouts that currently dominate industry conversation and capital allocation.
- ✓Substation Co-location Strategy: Thousands of existing electrical substations across the US carry underutilized capacity, typically 3–10 megawatts of headroom. Siting micro data centers directly adjacent to these substations bypasses transmission interconnection queues, reduces permitting timelines, and avoids new steel-in-ground costs — accelerating time-to-power for inference deployments by a meaningful margin.
- ✓Distributed Aggregation Model: A single substation's 5-megawatt surplus may be insufficient for viable data center economics. EPRI's approach aggregates five nearby sites within a metro region into a 25-megawatt distributed cluster, treating it as one logical project — satisfying both utility grid constraints and the minimum scale thresholds that data center operators require to justify investment.
- ✓Demand Flexibility as Grid Asset: Substations often hold significantly more capacity than their rated surplus, except during annual peak demand days. Pairing micro data centers with battery storage and backup generation, then engineering load-shedding protocols that reroute compute tasks to other nodes during grid peaks, unlocks that larger envelope without requiring additional grid upgrades.
- ✓Agentic AI Reshapes Load Forecasting: Early inference load models assumed human-driven usage patterns — daytime peaks, overnight lows. The rapid emergence of autonomous AI agents that run continuous background tasks around the clock invalidates that assumption. Grid planners and data center operators should model inference loads as potentially flat or inverted curves, not standard residential demand profiles.
What It Covers
EPRI's Ben Sooter explains how micro data centers — small, distributed inference facilities of 3–20 megawatts — can be co-located at underutilized electrical substations across the US to meet the coming wave of AI inference demand without overloading transmission grids or requiring new infrastructure investment.
Key Questions Answered
- •Inference vs. Training Power Split: Over the lifetime of an AI model, roughly 80–90% of total compute consumption occurs during inference, not training. Organizations planning energy infrastructure should weight capacity planning heavily toward inference workloads, not just the headline-grabbing training buildouts that currently dominate industry conversation and capital allocation.
- •Substation Co-location Strategy: Thousands of existing electrical substations across the US carry underutilized capacity, typically 3–10 megawatts of headroom. Siting micro data centers directly adjacent to these substations bypasses transmission interconnection queues, reduces permitting timelines, and avoids new steel-in-ground costs — accelerating time-to-power for inference deployments by a meaningful margin.
- •Distributed Aggregation Model: A single substation's 5-megawatt surplus may be insufficient for viable data center economics. EPRI's approach aggregates five nearby sites within a metro region into a 25-megawatt distributed cluster, treating it as one logical project — satisfying both utility grid constraints and the minimum scale thresholds that data center operators require to justify investment.
- •Demand Flexibility as Grid Asset: Substations often hold significantly more capacity than their rated surplus, except during annual peak demand days. Pairing micro data centers with battery storage and backup generation, then engineering load-shedding protocols that reroute compute tasks to other nodes during grid peaks, unlocks that larger envelope without requiring additional grid upgrades.
- •Agentic AI Reshapes Load Forecasting: Early inference load models assumed human-driven usage patterns — daytime peaks, overnight lows. The rapid emergence of autonomous AI agents that run continuous background tasks around the clock invalidates that assumption. Grid planners and data center operators should model inference loads as potentially flat or inverted curves, not standard residential demand profiles.
Notable Moment
Sooter reveals that his original assumption — inference loads would mirror human daily activity patterns and smooth out naturally — collapsed almost immediately when he recognized that agentic AI systems operate continuously overnight, forcing a complete revision of EPRI's load modeling approach before field measurements even began.
You just read a 3-minute summary of a 29-minute episode.
Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from NVIDIA AI Podcast
One Brain, Any Robot: Skild AI's Skild Brain Explained - Ep. 295
Apr 22 · 29 min
The Mel Robbins Podcast
Do THIS Every Day to Rewire Your Brain From Stress and Anxiety
Apr 27
More from NVIDIA AI Podcast
How AI Will Change Quantum Computing - Ep. 294
Apr 14 · 31 min
The Model Health Show
The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow
Apr 27
More from NVIDIA AI Podcast
We summarize every new episode. Want them in your inbox?
One Brain, Any Robot: Skild AI's Skild Brain Explained - Ep. 295
How AI Will Change Quantum Computing - Ep. 294
Building AI Factories: How Red Hat and NVIDIA Turn Enterprise Data Into Intelligence - Ep. 293
AI Agents and the Future of Global Trade with Alibaba’s Kuo Zhang - Ep. 291
Safer, Faster Public Transportation: AC Transit’s AI-Powered Upgrade with Hayden AI - Ep 290
Similar Episodes
Related episodes from other podcasts
The Mel Robbins Podcast
Apr 27
Do THIS Every Day to Rewire Your Brain From Stress and Anxiety
The Model Health Show
Apr 27
The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow
The Rest is History
Apr 26
664. Britain in the 70s: Scandal in Downing Street (Part 3)
The Learning Leader Show
Apr 26
685: David Epstein - The Freedom Trap, Narrative Values, General Magic, The Nobel Prize Winner Who Simplified Everything, Wearing the Same Thing Everyday, and Why Constraints Are the Secret to Your Best Work
The AI Breakdown
Apr 26
Where the Economy Thrives After AI
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into NVIDIA AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime