AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu

March 5, 2026

61 min episode · 3 min read

Alpin Yukseloglu

Episode

61 min

Read time

3 min

Topics

Artificial Intelligence, Crypto & Web3

AI-Generated Summary

Published Mar 6, 2026

Key Takeaways

✓AI exploit capability trajectory: Frontier models went from finding roughly 12–13% of critical smart contract bugs to over 70% within six months — a jump that occurred partly between drafting and publishing the EVM Bench paper. GPT-5.3 Codex now matches or approaches the collective output of human auditors on fund-draining vulnerabilities sourced from Code Arena audit contests. Expect a superhuman AI auditor within six to eight months.
✓False positive elimination via verifiability: Previous AI auditing tools produced high false positive rates, making them impractical. EVM Bench solves this by running exploits against a production-grade EVM environment loaded with real chain state. If the agent claims a bug exists, it must produce a working proof-of-concept that drains funds from the contract — reducing false positives to near zero and making AI audit results actionable.
✓Long-tail contract risk: Low-TVL protocols on EVM-compatible chains like Binance Smart Chain face the highest near-term exploit risk. These contracts were historically sheltered because the maximum extractable value was too small to attract skilled attackers. As inference costs drop below the value of exploiting even small contracts, AI agents will systematically collect this long tail — making security investment non-optional regardless of protocol size.
✓Crypto's verifiability accelerates AI training: Crypto code is among the most verifiable software in existence — agents can deploy contracts, assert state changes, and confirm exploits without human labelers. This creates a tight training signal that accelerates model improvement faster than most software domains. Paradigm expects models to develop strong crypto capabilities with less direct training data than initially anticipated, compressing the timeline to superhuman performance.
✓Offense-defense arms race framing: The near-term security outcome depends on whether white-hat or black-hat actors access frontier AI capabilities first. Paradigm's strategic response is embedding crypto benchmarks directly inside model labs — EVM Bench is now running inside OpenAI — to ensure defensive tooling develops alongside offensive capability. Protocols housing significant TVL should begin proactive AI-assisted auditing now rather than waiting for the first AI-attributed exploit.

What It Covers

Alpin Yukseloglu, investment and research partner at Paradigm, presents findings from EVM Bench — a benchmark co-authored with OpenAI measuring AI agents' ability to detect, patch, and exploit smart contract vulnerabilities. Top models jumped from under 20% to over 70% exploit detection in six months, reshaping crypto security assumptions.

Key Questions Answered

•AI exploit capability trajectory: Frontier models went from finding roughly 12–13% of critical smart contract bugs to over 70% within six months — a jump that occurred partly between drafting and publishing the EVM Bench paper. GPT-5.3 Codex now matches or approaches the collective output of human auditors on fund-draining vulnerabilities sourced from Code Arena audit contests. Expect a superhuman AI auditor within six to eight months.
•False positive elimination via verifiability: Previous AI auditing tools produced high false positive rates, making them impractical. EVM Bench solves this by running exploits against a production-grade EVM environment loaded with real chain state. If the agent claims a bug exists, it must produce a working proof-of-concept that drains funds from the contract — reducing false positives to near zero and making AI audit results actionable.
•Long-tail contract risk: Low-TVL protocols on EVM-compatible chains like Binance Smart Chain face the highest near-term exploit risk. These contracts were historically sheltered because the maximum extractable value was too small to attract skilled attackers. As inference costs drop below the value of exploiting even small contracts, AI agents will systematically collect this long tail — making security investment non-optional regardless of protocol size.
•Crypto's verifiability accelerates AI training: Crypto code is among the most verifiable software in existence — agents can deploy contracts, assert state changes, and confirm exploits without human labelers. This creates a tight training signal that accelerates model improvement faster than most software domains. Paradigm expects models to develop strong crypto capabilities with less direct training data than initially anticipated, compressing the timeline to superhuman performance.
•Offense-defense arms race framing: The near-term security outcome depends on whether white-hat or black-hat actors access frontier AI capabilities first. Paradigm's strategic response is embedding crypto benchmarks directly inside model labs — EVM Bench is now running inside OpenAI — to ensure defensive tooling develops alongside offensive capability. Protocols housing significant TVL should begin proactive AI-assisted auditing now rather than waiting for the first AI-attributed exploit.
•Agency over singularity anxiety: When facing uncertainty about AI's trajectory, Yukseloglu recommends replacing speculative theorizing with direct experimentation at the frontier. Both full acceptance and full denial of AI risk produce the same passive outcome. The practical alternative is running experiments, engaging model labs directly, and shipping within 24 hours of inception — speed over cohesion is the correct operating mode when the frontier remains experimentally unknowable.

Notable Moment

Yukseloglu describes a counterintuitive dynamic: Solana's prevalence of closed-source contracts, typically seen as a disadvantage, may actually accelerate AI model development on that stack. Contracts absent from public training data provide cleaner, uncontaminated evaluation signals — potentially giving closed-source ecosystems an unexpected edge in AI capability benchmarking.

Know someone who'd find this useful?

You just read a 3-minute summary of a 58-minute episode.

Get Bankless summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

The Mel Robbins Podcast

Apr 27

685: David Epstein - The Freedom Trap, Narrative Values, General Magic, The Nobel Prize Winner Who Simplified Everything, Wearing the Same Thing Everyday, and Why Constraints Are the Secret to Your Best Work

The AI Breakdown

Apr 26

Where the Economy Thrives After AI

Explore Related Topics

🤖Artificial Intelligence 🔗Crypto & Web3

This podcast is featured in Best Crypto Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Bankless.

Every Monday, we deliver AI summaries of the latest episodes from Bankless and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime

AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

ROLLUP: $300M DeFi Hack Fallout | Arbitrum Freezes Funds | AI Deflation Debate | Productive ETH

Do THIS Every Day to Rewire Your Brain From Stress and Anxiety

The $280M DeFi Exploit That Changes Crypto Forever | Dan Elitzer & Odysseus

The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow

More from Bankless

ROLLUP: $300M DeFi Hack Fallout | Arbitrum Freezes Funds | AI Deflation Debate | Productive ETH

The $280M DeFi Exploit That Changes Crypto Forever | Dan Elitzer & Odysseus

Productive Money: The Most Bullish Case for Ethereum ($250K) | Michael McGuiness & Vivek Raman

Can AI Agents Build Real Businesses? | Kelly Claude creator Austen Allred

ROLLUP: Markets at ATHs | Saylor’s STRC Bid | Trump DeFi Scandal | SEC Clears DeFi