Every Agent Needs a Box — Aaron Levie, Box

March 5, 2026

76 min episode · 3 min read

Aaron Levie,Jeff Huber

Episode

76 min

Read time

3 min

AI-Generated Summary

Published Mar 5, 2026

Key Takeaways

✓Agent Identity Architecture: Treating agents as standard user accounts creates critical security gaps. Unlike human employees, agents carry no legal liability, deserve no privacy protections, and require full auditability by their creator. Enterprises need a distinct identity layer — separate from Okta-style human IAM — that grants agents scoped file-system access, maintains creator oversight, and prevents unauthorized data exposure across organizational boundaries.
✓Coding Agent Advantage vs. Enterprise Gap: AI coding agents succeeded because of eight compounding advantages: full codebase access for new engineers, text-in/text-out medium, heavily trained models, developer self-use feedback loops, technical user base, and open knowledge sharing. Every other enterprise knowledge workflow — legal, finance, banking — faces six to seven structural headwinds against each of those properties, creating a multi-year deployment gap.
✓Context Engineering at Scale: A knowledge worker may have 10 million documents across teams and projects — roughly 50 million pages — but reliable model performance degrades significantly beyond approximately 60,000 tokens. Bridging that 50-million-to-60,000-token ratio requires purpose-built agentic search systems, multi-pass retrieval with self-ranking, and models capable of recognizing when continued searching will not yield better results rather than returning incomplete answers.
✓Workflow Adaptation Runs One Direction: Enterprises should not expect agents to conform to existing workflows. The coding world demonstrated that humans restructure their work to make agents effective — not the reverse. Organizations that proactively re-engineer documentation practices, digitize tacit knowledge, and restructure data access for agent readability will gain compounding velocity advantages over competitors still waiting for a frictionless drop-in solution.
✓Agent Evals as Core Infrastructure: Every enterprise deploying agents needs a private, held-out evaluation benchmark tied to their specific workflows — equivalent to Box's internal eval suite covering industries like financial services, legal, healthcare, and public sector. Running models against these benchmarks at each update cycle catches regressions, guides model selection, and validates harness changes. Box observed roughly 15-point score jumps between consecutive Anthropic Sonnet model generations on their internal suite.

What It Covers

Box CEO Aaron Levie joins Latent Space with Chroma CEO Jeff Huber to examine why enterprise AI agent deployment lags behind coding agents, covering data governance, agent identity management, access control architecture, context engineering challenges, and why Fortune 500 companies face a multi-year transformation timeline before realizing compounding productivity returns from autonomous agents.

Key Questions Answered

•Agent Identity Architecture: Treating agents as standard user accounts creates critical security gaps. Unlike human employees, agents carry no legal liability, deserve no privacy protections, and require full auditability by their creator. Enterprises need a distinct identity layer — separate from Okta-style human IAM — that grants agents scoped file-system access, maintains creator oversight, and prevents unauthorized data exposure across organizational boundaries.
•Coding Agent Advantage vs. Enterprise Gap: AI coding agents succeeded because of eight compounding advantages: full codebase access for new engineers, text-in/text-out medium, heavily trained models, developer self-use feedback loops, technical user base, and open knowledge sharing. Every other enterprise knowledge workflow — legal, finance, banking — faces six to seven structural headwinds against each of those properties, creating a multi-year deployment gap.
•Context Engineering at Scale: A knowledge worker may have 10 million documents across teams and projects — roughly 50 million pages — but reliable model performance degrades significantly beyond approximately 60,000 tokens. Bridging that 50-million-to-60,000-token ratio requires purpose-built agentic search systems, multi-pass retrieval with self-ranking, and models capable of recognizing when continued searching will not yield better results rather than returning incomplete answers.
•Workflow Adaptation Runs One Direction: Enterprises should not expect agents to conform to existing workflows. The coding world demonstrated that humans restructure their work to make agents effective — not the reverse. Organizations that proactively re-engineer documentation practices, digitize tacit knowledge, and restructure data access for agent readability will gain compounding velocity advantages over competitors still waiting for a frictionless drop-in solution.
•Agent Evals as Core Infrastructure: Every enterprise deploying agents needs a private, held-out evaluation benchmark tied to their specific workflows — equivalent to Box's internal eval suite covering industries like financial services, legal, healthcare, and public sector. Running models against these benchmarks at each update cycle catches regressions, guides model selection, and validates harness changes. Box observed roughly 15-point score jumps between consecutive Anthropic Sonnet model generations on their internal suite.
•Context Pruning Over Retention: Frontier models performing agentic search repeat failed strategies when unsuccessful attempts remain in the context window — even when the model's own reasoning trace flagged those attempts as flawed. The practical fix is active context pruning: remove failed search branches from the window entirely, but inject a brief summary noting the failure so the model avoids repeating it, rather than leaving the full error trace to re-anchor behavior.

Notable Moment

Levie describes asking an agent to retrieve addresses for all 10 Box office locations — a task with no single authoritative document. Lower-tier models consistently returned six of ten addresses and stopped, unaware of the gap. This illustrates a core unsolved problem: agents cannot reliably determine when exhaustive searching is warranted versus when the data simply does not exist.

Know someone who'd find this useful?

You just read a 3-minute summary of a 73-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Apr 23 · 54 min

The Mel Robbins Podcast

Do THIS Every Day to Rewire Your Brain From Stress and Anxiety

Apr 27

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

Apr 22 · 72 min

The Model Health Show

The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow

Apr 27

Similar Episodes

Related episodes from other podcasts

The Mel Robbins Podcast

Apr 27

685: David Epstein - The Freedom Trap, Narrative Values, General Magic, The Nobel Prize Winner Who Simplified Everything, Wearing the Same Thing Everyday, and Why Constraints Are the Secret to Your Best Work

The AI Breakdown

Apr 26

Where the Economy Thrives After AI

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime

Every Agent Needs a Box — Aaron Levie, Box

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Do THIS Every Day to Rewire Your Brain From Stress and Anxiety

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow

More from Latent Space

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

Similar Episodes

Do THIS Every Day to Rewire Your Brain From Stress and Anxiety

The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow

664. Britain in the 70s: Scandal in Downing Street (Part 3)

685: David Epstein - The Freedom Trap, Narrative Values, General Magic, The Nobel Prize Winner Who Simplified Everything, Wearing the Same Thing Everyday, and Why Constraints Are the Secret to Your Best Work

Where the Economy Thrives After AI

You're clearly into Latent Space.