How AI Data Platforms Are Shaping the Future of Enterprise Storage - Ep. 281
Episode
35 min
Read time
2 min
Topics
Fundraising & VC, Design & UX, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓AI-Ready Data Pipeline: Making unstructured enterprise data usable for AI requires finding, gathering, extracting text, chunking into uniform sizes, enriching with metadata, embedding into numeric representations, and indexing into vector databases for retrieval augmented generation systems.
- ✓Data Velocity Challenge: Enterprises face dual pressure from new data creation plus constant changes to existing documents. Without tracking which files changed, organizations must reindex entire datasets repeatedly, wasting compute resources like rewashing all dishes when only one is dirty.
- ✓Security Through In-Place Processing: Traditional AI pipelines create seven to thirteen copies of datasets across different systems, disconnecting them from source permissions. When access rights change, copied data remains accessible, creating major security vulnerabilities that GPU-in-storage architecture eliminates.
- ✓Agent Deployment in Storage: Storage vendors deploy AI agents directly on GPUs within storage systems to perform tasks like identifying unclassified documents that should be classified, monitoring system telemetry for optimization recommendations, and operating on data without unnecessary movement or copying.
What It Covers
Jacob Lieberman explains how NVIDIA's AI data platform reference design enables GPU-accelerated storage systems that prepare enterprise data for AI agents continuously in place, eliminating security risks from data copying and movement.
Key Questions Answered
- •AI-Ready Data Pipeline: Making unstructured enterprise data usable for AI requires finding, gathering, extracting text, chunking into uniform sizes, enriching with metadata, embedding into numeric representations, and indexing into vector databases for retrieval augmented generation systems.
- •Data Velocity Challenge: Enterprises face dual pressure from new data creation plus constant changes to existing documents. Without tracking which files changed, organizations must reindex entire datasets repeatedly, wasting compute resources like rewashing all dishes when only one is dirty.
- •Security Through In-Place Processing: Traditional AI pipelines create seven to thirteen copies of datasets across different systems, disconnecting them from source permissions. When access rights change, copied data remains accessible, creating major security vulnerabilities that GPU-in-storage architecture eliminates.
- •Agent Deployment in Storage: Storage vendors deploy AI agents directly on GPUs within storage systems to perform tasks like identifying unclassified documents that should be classified, monitoring system telemetry for optimization recommendations, and operating on data without unnecessary movement or copying.
Notable Moment
Lieberman compares AI agents working in storage systems to remote workers being more productive at home, avoiding commute time by keeping compute close to data rather than moving massive datasets to distant processing centers for transformation and analysis.
You just read a 3-minute summary of a 32-minute episode.
Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from NVIDIA AI Podcast
How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301
Jun 10 · 21 min
Cognitive Revolution
The Internet Computer: Caffeine.ai CEO Dominic Williams on Unstoppable, Self-Writing Software
Jan 25
More from NVIDIA AI Podcast
Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300
May 27 · 29 min
a16z Podcast
Building Search for AI Agents with Exa CEO Will Bryk
Jun 6
More from NVIDIA AI Podcast
We summarize every new episode. Want them in your inbox?
How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301
Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300
Inside AI Tokenomics: How to Profitably Turn Tokens Into Business Value | NVIDIA AI Podcast Ep. 299
Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298
Harrison Chase of LangChain on Deep Agents, LangSmith, and Earning Trust | NVIDIA AI Podcast Ep. 297
Similar Episodes
Related episodes from other podcasts
Cognitive Revolution
Jan 25
The Internet Computer: Caffeine.ai CEO Dominic Williams on Unstoppable, Self-Writing Software
a16z Podcast
Jun 6
Building Search for AI Agents with Exa CEO Will Bryk
Software Engineering Daily
Jun 2
The Hardware Bottleneck AI Can’t Fix
Eye on AI
May 28
Your Child's Data Profile Starts Before They're Born | Eamonn Maguire of Proton
Dwarkesh Podcast
May 22
Reiner Pope – Chip design from the bottom up
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into NVIDIA AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime