What are the key takeaways from this NVIDIA AI Podcast episode?

Key insights include: **AI-Ready Data Pipeline:** Making unstructured enterprise data usable for AI requires finding, gathering, extracting text, chunking into uniform sizes, enriching with metadata, embedding into numeric representations, and indexing into vector databases for retrieval augmented generation systems.; **Data Velocity Challenge:** Enterprises face dual pressure from new data creation plus constant changes to existing documents. Without tracking which files changed, organizations must reindex entire datasets repeatedly, wasting compute resources like rewashing all dishes when only one is dirty.; **Security Through In-Place Processing:** Traditional AI pipelines create seven to thirteen copies of datasets across different systems, disconnecting them from source permissions. When access rights change, copied data remains accessible, creating major security vulnerabilities that GPU-in-storage architecture eliminates.

What did Jacob Lieberman discuss on NVIDIA AI Podcast?

Jacob Lieberman explains how NVIDIA's AI data platform reference design enables GPU-accelerated storage systems that prepare enterprise data for AI agents continuously in place, eliminating security risks from data copying and movement. Key topics include: **AI-Ready Data Pipeline:** Making unstructured enterprise data usable for AI requires finding, gathering, extracting text, chunking into uniform sizes, enriching with metadata, embedding into numeric representations, and indexing into vector databases for retrieval augmented generation systems.; **Data Velocity Challenge:** Enterprises face dual pressure from new data creation plus constant changes to existing documents. Without tracking which files changed, organizations must reindex entire datasets repeatedly, wasting compute resources like rewashing all dishes when only one is dirty..

How long is this episode of NVIDIA AI Podcast?

This episode is 35 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

NVIDIA AI Podcast

How AI Data Platforms Are Shaping the Future of Enterprise Storage - Ep. 281

November 18, 2025

35 min episode · 2 min read

Jacob Lieberman

Episode

35 min

Read time

2 min

Topics

Fundraising & VC, Design & UX, Artificial Intelligence

AI-Generated Summary

Published Dec 26, 2025

Key Takeaways

✓AI-Ready Data Pipeline: Making unstructured enterprise data usable for AI requires finding, gathering, extracting text, chunking into uniform sizes, enriching with metadata, embedding into numeric representations, and indexing into vector databases for retrieval augmented generation systems.
✓Data Velocity Challenge: Enterprises face dual pressure from new data creation plus constant changes to existing documents. Without tracking which files changed, organizations must reindex entire datasets repeatedly, wasting compute resources like rewashing all dishes when only one is dirty.
✓Security Through In-Place Processing: Traditional AI pipelines create seven to thirteen copies of datasets across different systems, disconnecting them from source permissions. When access rights change, copied data remains accessible, creating major security vulnerabilities that GPU-in-storage architecture eliminates.
✓Agent Deployment in Storage: Storage vendors deploy AI agents directly on GPUs within storage systems to perform tasks like identifying unclassified documents that should be classified, monitoring system telemetry for optimization recommendations, and operating on data without unnecessary movement or copying.

What It Covers

Jacob Lieberman explains how NVIDIA's AI data platform reference design enables GPU-accelerated storage systems that prepare enterprise data for AI agents continuously in place, eliminating security risks from data copying and movement.

Key Questions Answered

•AI-Ready Data Pipeline: Making unstructured enterprise data usable for AI requires finding, gathering, extracting text, chunking into uniform sizes, enriching with metadata, embedding into numeric representations, and indexing into vector databases for retrieval augmented generation systems.
•Data Velocity Challenge: Enterprises face dual pressure from new data creation plus constant changes to existing documents. Without tracking which files changed, organizations must reindex entire datasets repeatedly, wasting compute resources like rewashing all dishes when only one is dirty.
•Security Through In-Place Processing: Traditional AI pipelines create seven to thirteen copies of datasets across different systems, disconnecting them from source permissions. When access rights change, copied data remains accessible, creating major security vulnerabilities that GPU-in-storage architecture eliminates.
•Agent Deployment in Storage: Storage vendors deploy AI agents directly on GPUs within storage systems to perform tasks like identifying unclassified documents that should be classified, monitoring system telemetry for optimization recommendations, and operating on data without unnecessary movement or copying.

Notable Moment

Lieberman compares AI agents working in storage systems to remote workers being more productive at home, avoiding commute time by keeping compute close to data rather than moving massive datasets to distant processing centers for transformation and analysis.

Know someone who'd find this useful?

You just read a 3-minute summary of a 32-minute episode.

Get NVIDIA AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

Cognitive Revolution

Jan 25

Explore Related Topics

💰Fundraising & VC 🎨Design & UX 🤖Artificial Intelligence

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into NVIDIA AI Podcast.

Every Monday, we deliver AI summaries of the latest episodes from NVIDIA AI Podcast and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime

How AI Data Platforms Are Shaping the Future of Enterprise Storage - Ep. 281

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301

The Internet Computer: Caffeine.ai CEO Dominic Williams on Unstoppable, Self-Writing Software

Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300

Building Search for AI Agents with Exa CEO Will Bryk

More from NVIDIA AI Podcast

How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301

Everyone Can Build a Robot: Open Source Embodied AI With Seeed Studio | NVIDIA AI Podcast Ep. 300

Inside AI Tokenomics: How to Profitably Turn Tokens Into Business Value | NVIDIA AI Podcast Ep. 299

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Harrison Chase of LangChain on Deep Agents, LangSmith, and Earning Trust | NVIDIA AI Podcast Ep. 297

Similar Episodes

The Internet Computer: Caffeine.ai CEO Dominic Williams on Unstoppable, Self-Writing Software

Building Search for AI Agents with Exa CEO Will Bryk

The Hardware Bottleneck AI Can’t Fix

Your Child's Data Profile Starts Before They're Born | Eamonn Maguire of Proton

Reiner Pope – Chip design from the bottom up

Explore Related Topics

You're clearly into NVIDIA AI Podcast.