AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu
Episode
61 min
Read time
3 min
Topics
Artificial Intelligence, Crypto & Web3
AI-Generated Summary
Key Takeaways
- ✓AI exploit capability trajectory: Frontier models went from finding roughly 12–13% of critical smart contract bugs to over 70% within six months — a jump that occurred partly between drafting and publishing the EVM Bench paper. GPT-5.3 Codex now matches or approaches the collective output of human auditors on fund-draining vulnerabilities sourced from Code Arena audit contests. Expect a superhuman AI auditor within six to eight months.
- ✓False positive elimination via verifiability: Previous AI auditing tools produced high false positive rates, making them impractical. EVM Bench solves this by running exploits against a production-grade EVM environment loaded with real chain state. If the agent claims a bug exists, it must produce a working proof-of-concept that drains funds from the contract — reducing false positives to near zero and making AI audit results actionable.
- ✓Long-tail contract risk: Low-TVL protocols on EVM-compatible chains like Binance Smart Chain face the highest near-term exploit risk. These contracts were historically sheltered because the maximum extractable value was too small to attract skilled attackers. As inference costs drop below the value of exploiting even small contracts, AI agents will systematically collect this long tail — making security investment non-optional regardless of protocol size.
- ✓Crypto's verifiability accelerates AI training: Crypto code is among the most verifiable software in existence — agents can deploy contracts, assert state changes, and confirm exploits without human labelers. This creates a tight training signal that accelerates model improvement faster than most software domains. Paradigm expects models to develop strong crypto capabilities with less direct training data than initially anticipated, compressing the timeline to superhuman performance.
- ✓Offense-defense arms race framing: The near-term security outcome depends on whether white-hat or black-hat actors access frontier AI capabilities first. Paradigm's strategic response is embedding crypto benchmarks directly inside model labs — EVM Bench is now running inside OpenAI — to ensure defensive tooling develops alongside offensive capability. Protocols housing significant TVL should begin proactive AI-assisted auditing now rather than waiting for the first AI-attributed exploit.
What It Covers
Alpin Yukseloglu, investment and research partner at Paradigm, presents findings from EVM Bench — a benchmark co-authored with OpenAI measuring AI agents' ability to detect, patch, and exploit smart contract vulnerabilities. Top models jumped from under 20% to over 70% exploit detection in six months, reshaping crypto security assumptions.
Key Questions Answered
- •AI exploit capability trajectory: Frontier models went from finding roughly 12–13% of critical smart contract bugs to over 70% within six months — a jump that occurred partly between drafting and publishing the EVM Bench paper. GPT-5.3 Codex now matches or approaches the collective output of human auditors on fund-draining vulnerabilities sourced from Code Arena audit contests. Expect a superhuman AI auditor within six to eight months.
- •False positive elimination via verifiability: Previous AI auditing tools produced high false positive rates, making them impractical. EVM Bench solves this by running exploits against a production-grade EVM environment loaded with real chain state. If the agent claims a bug exists, it must produce a working proof-of-concept that drains funds from the contract — reducing false positives to near zero and making AI audit results actionable.
- •Long-tail contract risk: Low-TVL protocols on EVM-compatible chains like Binance Smart Chain face the highest near-term exploit risk. These contracts were historically sheltered because the maximum extractable value was too small to attract skilled attackers. As inference costs drop below the value of exploiting even small contracts, AI agents will systematically collect this long tail — making security investment non-optional regardless of protocol size.
- •Crypto's verifiability accelerates AI training: Crypto code is among the most verifiable software in existence — agents can deploy contracts, assert state changes, and confirm exploits without human labelers. This creates a tight training signal that accelerates model improvement faster than most software domains. Paradigm expects models to develop strong crypto capabilities with less direct training data than initially anticipated, compressing the timeline to superhuman performance.
- •Offense-defense arms race framing: The near-term security outcome depends on whether white-hat or black-hat actors access frontier AI capabilities first. Paradigm's strategic response is embedding crypto benchmarks directly inside model labs — EVM Bench is now running inside OpenAI — to ensure defensive tooling develops alongside offensive capability. Protocols housing significant TVL should begin proactive AI-assisted auditing now rather than waiting for the first AI-attributed exploit.
- •Agency over singularity anxiety: When facing uncertainty about AI's trajectory, Yukseloglu recommends replacing speculative theorizing with direct experimentation at the frontier. Both full acceptance and full denial of AI risk produce the same passive outcome. The practical alternative is running experiments, engaging model labs directly, and shipping within 24 hours of inception — speed over cohesion is the correct operating mode when the frontier remains experimentally unknowable.
Notable Moment
Yukseloglu describes a counterintuitive dynamic: Solana's prevalence of closed-source contracts, typically seen as a disadvantage, may actually accelerate AI model development on that stack. Contracts absent from public training data provide cleaner, uncontaminated evaluation signals — potentially giving closed-source ecosystems an unexpected edge in AI capability benchmarking.
You just read a 3-minute summary of a 58-minute episode.
Get Bankless summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Bankless
ROLLUP: $300M DeFi Hack Fallout | Arbitrum Freezes Funds | AI Deflation Debate | Productive ETH
Apr 24 · 90 min
The Mel Robbins Podcast
Do THIS Every Day to Rewire Your Brain From Stress and Anxiety
Apr 27
More from Bankless
The $280M DeFi Exploit That Changes Crypto Forever | Dan Elitzer & Odysseus
Apr 23 · 73 min
The Model Health Show
The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow
Apr 27
More from Bankless
We summarize every new episode. Want them in your inbox?
ROLLUP: $300M DeFi Hack Fallout | Arbitrum Freezes Funds | AI Deflation Debate | Productive ETH
The $280M DeFi Exploit That Changes Crypto Forever | Dan Elitzer & Odysseus
Productive Money: The Most Bullish Case for Ethereum ($250K) | Michael McGuiness & Vivek Raman
Can AI Agents Build Real Businesses? | Kelly Claude creator Austen Allred
ROLLUP: Markets at ATHs | Saylor’s STRC Bid | Trump DeFi Scandal | SEC Clears DeFi
Similar Episodes
Related episodes from other podcasts
The Mel Robbins Podcast
Apr 27
Do THIS Every Day to Rewire Your Brain From Stress and Anxiety
The Model Health Show
Apr 27
The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow
The Rest is History
Apr 26
664. Britain in the 70s: Scandal in Downing Street (Part 3)
The Learning Leader Show
Apr 26
685: David Epstein - The Freedom Trap, Narrative Values, General Magic, The Nobel Prize Winner Who Simplified Everything, Wearing the Same Thing Everyday, and Why Constraints Are the Secret to Your Best Work
The AI Breakdown
Apr 26
Where the Economy Thrives After AI
Explore Related Topics
This podcast is featured in Best Crypto Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Bankless.
Every Monday, we deliver AI summaries of the latest episodes from Bankless and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime