Skip to main content
Bankless

AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu

61 min episode · 3 min read
·

Episode

61 min

Read time

3 min

Topics

Artificial Intelligence, Crypto & Web3

AI-Generated Summary

Key Takeaways

  • AI exploit capability trajectory: Frontier models went from finding roughly 12–13% of critical smart contract bugs to over 70% within six months — a jump that occurred partly between drafting and publishing the EVM Bench paper. GPT-5.3 Codex now matches or approaches the collective output of human auditors on fund-draining vulnerabilities sourced from Code Arena audit contests. Expect a superhuman AI auditor within six to eight months.
  • False positive elimination via verifiability: Previous AI auditing tools produced high false positive rates, making them impractical. EVM Bench solves this by running exploits against a production-grade EVM environment loaded with real chain state. If the agent claims a bug exists, it must produce a working proof-of-concept that drains funds from the contract — reducing false positives to near zero and making AI audit results actionable.
  • Long-tail contract risk: Low-TVL protocols on EVM-compatible chains like Binance Smart Chain face the highest near-term exploit risk. These contracts were historically sheltered because the maximum extractable value was too small to attract skilled attackers. As inference costs drop below the value of exploiting even small contracts, AI agents will systematically collect this long tail — making security investment non-optional regardless of protocol size.
  • Crypto's verifiability accelerates AI training: Crypto code is among the most verifiable software in existence — agents can deploy contracts, assert state changes, and confirm exploits without human labelers. This creates a tight training signal that accelerates model improvement faster than most software domains. Paradigm expects models to develop strong crypto capabilities with less direct training data than initially anticipated, compressing the timeline to superhuman performance.
  • Offense-defense arms race framing: The near-term security outcome depends on whether white-hat or black-hat actors access frontier AI capabilities first. Paradigm's strategic response is embedding crypto benchmarks directly inside model labs — EVM Bench is now running inside OpenAI — to ensure defensive tooling develops alongside offensive capability. Protocols housing significant TVL should begin proactive AI-assisted auditing now rather than waiting for the first AI-attributed exploit.

What It Covers

Alpin Yukseloglu, investment and research partner at Paradigm, presents findings from EVM Bench — a benchmark co-authored with OpenAI measuring AI agents' ability to detect, patch, and exploit smart contract vulnerabilities. Top models jumped from under 20% to over 70% exploit detection in six months, reshaping crypto security assumptions.

Key Questions Answered

  • AI exploit capability trajectory: Frontier models went from finding roughly 12–13% of critical smart contract bugs to over 70% within six months — a jump that occurred partly between drafting and publishing the EVM Bench paper. GPT-5.3 Codex now matches or approaches the collective output of human auditors on fund-draining vulnerabilities sourced from Code Arena audit contests. Expect a superhuman AI auditor within six to eight months.
  • False positive elimination via verifiability: Previous AI auditing tools produced high false positive rates, making them impractical. EVM Bench solves this by running exploits against a production-grade EVM environment loaded with real chain state. If the agent claims a bug exists, it must produce a working proof-of-concept that drains funds from the contract — reducing false positives to near zero and making AI audit results actionable.
  • Long-tail contract risk: Low-TVL protocols on EVM-compatible chains like Binance Smart Chain face the highest near-term exploit risk. These contracts were historically sheltered because the maximum extractable value was too small to attract skilled attackers. As inference costs drop below the value of exploiting even small contracts, AI agents will systematically collect this long tail — making security investment non-optional regardless of protocol size.
  • Crypto's verifiability accelerates AI training: Crypto code is among the most verifiable software in existence — agents can deploy contracts, assert state changes, and confirm exploits without human labelers. This creates a tight training signal that accelerates model improvement faster than most software domains. Paradigm expects models to develop strong crypto capabilities with less direct training data than initially anticipated, compressing the timeline to superhuman performance.
  • Offense-defense arms race framing: The near-term security outcome depends on whether white-hat or black-hat actors access frontier AI capabilities first. Paradigm's strategic response is embedding crypto benchmarks directly inside model labs — EVM Bench is now running inside OpenAI — to ensure defensive tooling develops alongside offensive capability. Protocols housing significant TVL should begin proactive AI-assisted auditing now rather than waiting for the first AI-attributed exploit.
  • Agency over singularity anxiety: When facing uncertainty about AI's trajectory, Yukseloglu recommends replacing speculative theorizing with direct experimentation at the frontier. Both full acceptance and full denial of AI risk produce the same passive outcome. The practical alternative is running experiments, engaging model labs directly, and shipping within 24 hours of inception — speed over cohesion is the correct operating mode when the frontier remains experimentally unknowable.

Notable Moment

Yukseloglu describes a counterintuitive dynamic: Solana's prevalence of closed-source contracts, typically seen as a disadvantage, may actually accelerate AI model development on that stack. Contracts absent from public training data provide cleaner, uncontaminated evaluation signals — potentially giving closed-source ecosystems an unexpected edge in AI capability benchmarking.

Know someone who'd find this useful?

You just read a 3-minute summary of a 58-minute episode.

Get Bankless summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Bankless

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best Crypto Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Bankless.

Every Monday, we deliver AI summaries of the latest episodes from Bankless and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime