What are the key takeaways from this Latent Space episode?

Key insights include: **Agent Architecture — Out-of-Box vs In-Box:** Running the agent harness outside the sandbox is more complex but architecturally superior for security. When the agent runs inside the sandbox, secrets must live there too, creating exfiltration risk. The out-of-box approach separates the "brain" in a control plane from the "hands" in the sandbox, allowing scoped credentials per machine and cleaner permission boundaries across multi-user environments.; **VM Infrastructure Over Docker:** Full virtual machines outperform Docker containers for coding agents for two reasons: Docker is not a true security boundary, and real applications often use Docker internally, creating nested Docker-in-Docker conflicts. Cognition built a custom block-diff file storage format so VMs only write changes proportional to the file system diff, dramatically reducing boot and restore times for agent sessions.; **Repo Setup as the Persistent Bottleneck:** Getting agents to run, test, and interact with a codebase autonomously requires a working local developer environment — including Docker Compose, local databases, and scoped credentials. Most companies lack this infrastructure, especially older ones built before containerization. Teams should prioritize local dev environment setup before deploying background agents, as agents cannot ask "Bob" for secrets.

What did Walden Yan and Cole Murray discuss on Latent Space?

Walden Yan from Cognition and Cole Murray from OpenInspect examine the architecture of background coding agents, covering the technical decisions behind building cloud-based development systems. Cognition's internal data shows Devin-authored commits grew from 16% to 80% of all commits between January and March 2025, while engineering headcount grew only 10%. Key topics include: **Agent Architecture — Out-of-Box vs In-Box:** Running the agent harness outside the sandbox is more complex but architecturally superior for security. When the agent runs inside the sandbox, secrets must live there too, creating exfiltration risk. The out-of-box approach separates the "brain" in a control plane from the "hands" in the sandbox, allowing scoped credentials per machine and cleaner permission boundaries across multi-user environments.; **VM Infrastructure Over Docker:** Full virtual machines outperform Docker containers for coding agents for two reasons: Docker is not a true security boundary, and real applications often use Docker internally, creating nested Docker-in-Docker conflicts. Cognition built a custom block-diff file storage format so VMs only write changes proportional to the file system diff, dramatically reducing boot and restore times for agent sessions..

How long is this episode of Latent Space?

This episode is 68 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Latent Space

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

May 28, 2026

68 min episode · 3 min read

Walden Yan,Cole Murray

Episode

68 min

Read time

3 min

Topics

Remote Work, Artificial Intelligence, Software Development

AI-Generated Summary

Published May 28, 2026

Key Takeaways

✓Agent Architecture — Out-of-Box vs In-Box: Running the agent harness outside the sandbox is more complex but architecturally superior for security. When the agent runs inside the sandbox, secrets must live there too, creating exfiltration risk. The out-of-box approach separates the "brain" in a control plane from the "hands" in the sandbox, allowing scoped credentials per machine and cleaner permission boundaries across multi-user environments.
✓VM Infrastructure Over Docker: Full virtual machines outperform Docker containers for coding agents for two reasons: Docker is not a true security boundary, and real applications often use Docker internally, creating nested Docker-in-Docker conflicts. Cognition built a custom block-diff file storage format so VMs only write changes proportional to the file system diff, dramatically reducing boot and restore times for agent sessions.
✓Repo Setup as the Persistent Bottleneck: Getting agents to run, test, and interact with a codebase autonomously requires a working local developer environment — including Docker Compose, local databases, and scoped credentials. Most companies lack this infrastructure, especially older ones built before containerization. Teams should prioritize local dev environment setup before deploying background agents, as agents cannot ask "Bob" for secrets.
✓Memory Generation and Retrieval Remain Unsolved: Cognition's production memory system auto-generates memories when users correct Devin, with ~95% of stored memories created automatically rather than manually written. The core challenge is dual: generation must avoid over-generalizing one-off preferences into permanent rules, and retrieval must surface relevant memories without flooding context. Agents editing memory files directly, treating memory like a navigable file system, is an emerging alternative approach.
✓AI Code Slop Patterns Require Lint Guards: Specific anti-patterns emerge consistently from AI-generated code: `getattr` used defensively even when attributes are known, untyped `dict[str, Any]` returns, backwards-compatibility shims that add unnecessary import-export layers, and excessive inline documentation. Teams should encode these as Semgrep or lint rules that fail pull requests automatically, preventing AI patterns from cementing into the codebase as reference examples for future generations.

What It Covers

Walden Yan from Cognition and Cole Murray from OpenInspect examine the architecture of background coding agents, covering the technical decisions behind building cloud-based development systems. Cognition's internal data shows Devin-authored commits grew from 16% to 80% of all commits between January and March 2025, while engineering headcount grew only 10%.

Key Questions Answered

•Agent Architecture — Out-of-Box vs In-Box: Running the agent harness outside the sandbox is more complex but architecturally superior for security. When the agent runs inside the sandbox, secrets must live there too, creating exfiltration risk. The out-of-box approach separates the "brain" in a control plane from the "hands" in the sandbox, allowing scoped credentials per machine and cleaner permission boundaries across multi-user environments.
•VM Infrastructure Over Docker: Full virtual machines outperform Docker containers for coding agents for two reasons: Docker is not a true security boundary, and real applications often use Docker internally, creating nested Docker-in-Docker conflicts. Cognition built a custom block-diff file storage format so VMs only write changes proportional to the file system diff, dramatically reducing boot and restore times for agent sessions.
•Repo Setup as the Persistent Bottleneck: Getting agents to run, test, and interact with a codebase autonomously requires a working local developer environment — including Docker Compose, local databases, and scoped credentials. Most companies lack this infrastructure, especially older ones built before containerization. Teams should prioritize local dev environment setup before deploying background agents, as agents cannot ask "Bob" for secrets.
•Memory Generation and Retrieval Remain Unsolved: Cognition's production memory system auto-generates memories when users correct Devin, with ~95% of stored memories created automatically rather than manually written. The core challenge is dual: generation must avoid over-generalizing one-off preferences into permanent rules, and retrieval must surface relevant memories without flooding context. Agents editing memory files directly, treating memory like a navigable file system, is an emerging alternative approach.
•AI Code Slop Patterns Require Lint Guards: Specific anti-patterns emerge consistently from AI-generated code: `getattr` used defensively even when attributes are known, untyped `dict[str, Any]` returns, backwards-compatibility shims that add unnecessary import-export layers, and excessive inline documentation. Teams should encode these as Semgrep or lint rules that fail pull requests automatically, preventing AI patterns from cementing into the codebase as reference examples for future generations.
•SRE Auto-Triage as the Highest-ROI Entry Point: The most common and immediately valuable background agent use case is first-responder triage on alerts from Datadog, Sentry, or Slack. The agent does not need to resolve incidents — collecting full context, referencing playbooks, and drafting a pull request before a human reviews delivers compressive value. OpenInspect supports generic webhooks for this trigger; teams report spending between $1,000 and $5,000 per engineer monthly on agent compute for this workflow.

Notable Moment

Cognition ran an internal experiment building a full product using autonomous agents with auto-merge and zero code review. By the two-week mark, changing a single button color required touching ten different implementations. The conclusion: scheduled human-led or agent-led cleanup of duplication is necessary, or codebases regress toward their worst contributor's patterns.

Know someone who'd find this useful?

You just read a 3-minute summary of a 65-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO

Jul 8 · 57 min

The Daily (NYT)

Assassination Attempt Suspect Charged

Apr 28

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

Jul 1 · 108 min

Huberman Lab

How to Improve Your Memory & Cognitive Function at Any Age | Dr. Alan Castel

Jul 13

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

Docker Compose
“Getting agents to run, test, and interact with a codebase autonomously requires a working local developer environment — including Docker Compose, local databases, and scoped credentials.”
Datadog
“The most common and immediately valuable background agent use case is first-responder triage on alerts from Datadog, Sentry, or Slack.”
Sentry
“The most common and immediately valuable background agent use case is first-responder triage on alerts from Datadog, Sentry, or Slack.”
Semgrep
“Teams should encode these as Semgrep or lint rules that fail pull requests automatically, preventing AI patterns from cementing into the codebase”

Products

DevinBy guest
by Cognition
“Cognition's internal data shows Devin-authored commits grew from 16% to 80% of all commits between January and March 2025”
Amazon

Similar Episodes

Related episodes from other podcasts

The Daily (NYT)

Apr 28

Explore Related Topics

🏠Remote Work 🤖Artificial Intelligence 💻Software Development

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO

Assassination Attempt Suspect Charged

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

How to Improve Your Memory & Cognitive Function at Any Age | Dr. Alan Castel

Books, tools, and gear mentioned in this episode

Tools

Products

More from Latent Space

Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO

🔬 The Coolest Diffusion Research Isn't in LLMs — Evan Feinberg & Sergey Edunov, Genesis Molecular AI

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

The Professor of Outputmaxxing — Anjney Midha, AMP

Similar Episodes

Assassination Attempt Suspect Charged

How to Improve Your Memory & Cognitive Function at Any Age | Dr. Alan Castel

AI Costs Are Surging and the Cheap Model Fix Might Not Last

India Is Becoming an Architect of the Global AI Order | Ivana Bartoletti of Wipro

1338: Jamie Metzl | The AI Ten Commandments

Explore Related Topics

You're clearly into Latent Space.