Dealing with increasingly complicated agents

October 16, 2025

54 min episode · 2 min read

Increasingly Complicated Agents

Episode

54 min

Read time

2 min

AI-Generated Summary

Published Dec 19, 2025

Key Takeaways

✓Agent Security Model: Any tool exposed to an LLM becomes accessible to anyone controlling LLM input through prompt injection, requiring deterministic authorization controls outside the model itself.
✓Password Attack Analogy: Jailbreaking resembles password cracking - focus on limiting attempt frequency rather than perfect blocking, using guardrails as detection signals to suspend suspicious users after multiple triggers.
✓Code-Then-Execute Pattern: Generate execution plans before untrusted data enters context, using data flow analysis to enforce tool policies based on input source trustworthiness - most promising security design pattern.
✓Complexity Explosion: Modern agent workflows mix multiple untrusted data sources in single LLM contexts, where any malicious component can compromise the entire system through cross-contamination attacks.

What It Covers

Donato Capitella from Reversec explains how AI agents accessing external tools create massive security vulnerabilities, requiring new design patterns beyond traditional LLM red teaming approaches.

Key Questions Answered

•Agent Security Model: Any tool exposed to an LLM becomes accessible to anyone controlling LLM input through prompt injection, requiring deterministic authorization controls outside the model itself.
•Password Attack Analogy: Jailbreaking resembles password cracking - focus on limiting attempt frequency rather than perfect blocking, using guardrails as detection signals to suspend suspicious users after multiple triggers.
•Code-Then-Execute Pattern: Generate execution plans before untrusted data enters context, using data flow analysis to enforce tool policies based on input source trustworthiness - most promising security design pattern.
•Complexity Explosion: Modern agent workflows mix multiple untrusted data sources in single LLM contexts, where any malicious component can compromise the entire system through cross-contamination attacks.

Notable Moment

Capitella demonstrates how attackers can inject malicious emails into support ticket databases, later triggering phishing responses when legitimate customers submit related queries through the system.

Know someone who'd find this useful?