Skip to main content
Latent Space

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

68 min episode · 3 min read
·

Episode

68 min

Read time

3 min

Topics

Remote Work

AI-Generated Summary

Key Takeaways

  • Agent Architecture — Out-of-Box vs In-Box: Running the agent harness outside the sandbox is more complex but architecturally superior for security. When the agent runs inside the sandbox, secrets must live there too, creating exfiltration risk. The out-of-box approach separates the "brain" in a control plane from the "hands" in the sandbox, allowing scoped credentials per machine and cleaner permission boundaries across multi-user environments.
  • VM Infrastructure Over Docker: Full virtual machines outperform Docker containers for coding agents for two reasons: Docker is not a true security boundary, and real applications often use Docker internally, creating nested Docker-in-Docker conflicts. Cognition built a custom block-diff file storage format so VMs only write changes proportional to the file system diff, dramatically reducing boot and restore times for agent sessions.
  • Repo Setup as the Persistent Bottleneck: Getting agents to run, test, and interact with a codebase autonomously requires a working local developer environment — including Docker Compose, local databases, and scoped credentials. Most companies lack this infrastructure, especially older ones built before containerization. Teams should prioritize local dev environment setup before deploying background agents, as agents cannot ask "Bob" for secrets.
  • Memory Generation and Retrieval Remain Unsolved: Cognition's production memory system auto-generates memories when users correct Devin, with ~95% of stored memories created automatically rather than manually written. The core challenge is dual: generation must avoid over-generalizing one-off preferences into permanent rules, and retrieval must surface relevant memories without flooding context. Agents editing memory files directly, treating memory like a navigable file system, is an emerging alternative approach.
  • AI Code Slop Patterns Require Lint Guards: Specific anti-patterns emerge consistently from AI-generated code: `getattr` used defensively even when attributes are known, untyped `dict[str, Any]` returns, backwards-compatibility shims that add unnecessary import-export layers, and excessive inline documentation. Teams should encode these as Semgrep or lint rules that fail pull requests automatically, preventing AI patterns from cementing into the codebase as reference examples for future generations.

What It Covers

Walden Yan from Cognition and Cole Murray from OpenInspect examine the architecture of background coding agents, covering the technical decisions behind building cloud-based development systems. Cognition's internal data shows Devin-authored commits grew from 16% to 80% of all commits between January and March 2025, while engineering headcount grew only 10%.

Key Questions Answered

  • Agent Architecture — Out-of-Box vs In-Box: Running the agent harness outside the sandbox is more complex but architecturally superior for security. When the agent runs inside the sandbox, secrets must live there too, creating exfiltration risk. The out-of-box approach separates the "brain" in a control plane from the "hands" in the sandbox, allowing scoped credentials per machine and cleaner permission boundaries across multi-user environments.
  • VM Infrastructure Over Docker: Full virtual machines outperform Docker containers for coding agents for two reasons: Docker is not a true security boundary, and real applications often use Docker internally, creating nested Docker-in-Docker conflicts. Cognition built a custom block-diff file storage format so VMs only write changes proportional to the file system diff, dramatically reducing boot and restore times for agent sessions.
  • Repo Setup as the Persistent Bottleneck: Getting agents to run, test, and interact with a codebase autonomously requires a working local developer environment — including Docker Compose, local databases, and scoped credentials. Most companies lack this infrastructure, especially older ones built before containerization. Teams should prioritize local dev environment setup before deploying background agents, as agents cannot ask "Bob" for secrets.
  • Memory Generation and Retrieval Remain Unsolved: Cognition's production memory system auto-generates memories when users correct Devin, with ~95% of stored memories created automatically rather than manually written. The core challenge is dual: generation must avoid over-generalizing one-off preferences into permanent rules, and retrieval must surface relevant memories without flooding context. Agents editing memory files directly, treating memory like a navigable file system, is an emerging alternative approach.
  • AI Code Slop Patterns Require Lint Guards: Specific anti-patterns emerge consistently from AI-generated code: `getattr` used defensively even when attributes are known, untyped `dict[str, Any]` returns, backwards-compatibility shims that add unnecessary import-export layers, and excessive inline documentation. Teams should encode these as Semgrep or lint rules that fail pull requests automatically, preventing AI patterns from cementing into the codebase as reference examples for future generations.
  • SRE Auto-Triage as the Highest-ROI Entry Point: The most common and immediately valuable background agent use case is first-responder triage on alerts from Datadog, Sentry, or Slack. The agent does not need to resolve incidents — collecting full context, referencing playbooks, and drafting a pull request before a human reviews delivers compressive value. OpenInspect supports generic webhooks for this trigger; teams report spending between $1,000 and $5,000 per engineer monthly on agent compute for this workflow.

Notable Moment

Cognition ran an internal experiment building a full product using autonomous agents with auto-merge and zero code review. By the two-week mark, changing a single button color required touching ten different implementations. The conclusion: scheduled human-led or agent-led cleanup of duplication is necessary, or codebases regress toward their worst contributor's patterns.

Know someone who'd find this useful?

You just read a 3-minute summary of a 65-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Latent Space

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime