Skip to main content
SM

Sean Macgregor

Sean Macgregor**ai Incident Database Methodology**third-party Audit Necessity**benchmark Limitations for Practical Deployment**guard Model Vulnerability Patterns
1episode
1podcast

We have 1 summarized appearance for Sean Macgregor so far. Browse all podcasts to discover more episodes.

Featured On 1 Podcast

Top resources Sean Macgregor mentions

Books, tools, and gear cited across podcast appearances. Ranked by frequency.

SignalCast may earn commission on purchases via affiliate links on each resource page.

All Appearances

1 episode
Practical AI

AI incidents, audits, and the limits of benchmarks

Practical AI
43 minCofounder and Lead Research Engineer at the AI Verification and Evaluation Research Institute, Founder of the AI Incident Database

AI Summary

→ WHAT IT COVERS Sean MacGregor, founder of the AI Incident Database and cofounder of the AI Verification and Evaluation Research Institute, explains how AI safety incidents are documented, why third-party audits matter for AI systems, and how benchmarks often fail to predict real-world model behavior. The database contains over 5,000 human-annotated reports across 1,000+ discrete incidents. → KEY INSIGHTS - **AI Incident Database methodology:** The database collects incidents primarily through journalistic reporting because journalists validate base facts, though this creates limitations in assigning incident rates. The system has documented over 5,000 human-annotated reports across more than 1,000 discrete incidents, focusing on harms that inform production of safer AI rather than indexing every minor occurrence that happens millions of times daily. - **Third-party audit necessity:** Organizations deploying general-purpose AI systems face a fundamental problem because traditional safety processes assume specific contexts, but frontier models operate across wildcard circumstances. Third-party audits provide independent verification similar to financial audits, where representations about model capabilities must be checked against actual evidence rather than relying on first-party claims that likely haven't been tested in specific deployment environments. - **Benchmark limitations for practical deployment:** Most AI benchmarks are produced for research and knowledge generation purposes, not practical deployment decisions. Benchmarks like BBQ for bias testing operate within specific prompt distributions that may not generalize to actual deployment environments. The BenchRisk meta-evaluation project found many benchmarks lack sufficient documentation and evidence, essentially providing trust-me-bro level receipts rather than rigorous validation for real-world safety claims. - **Guard model vulnerability patterns:** At the Defcon Generative Red Team competition with a 7 billion parameter model, the most exploited vulnerability was the handoff between guard models and underlying foundation models. When guard models use soft rejection strategies that reprompt rather than hard reject, attackers can systematically exploit this interface. Systems composed of multiple models often have undertested interfaces, especially when benchmarks evaluate components separately rather than the integrated system. - **Statistical rigor in security testing:** Security researchers attempting to break AI systems must demonstrate systematic vulnerabilities rather than anecdotal exploits, requiring statistical evidence that attacks work reliably across multiple attempts. A single successful jailbreak from 100 attempts against a system with 99 percent filtering effectiveness provides no useful information for system designers. Effective flaw reports must show attack strategies that consistently underperform documented safety thresholds. → NOTABLE MOMENT A traffic camera system sent a citation to someone after misidentifying a woman wearing a shirt that said "knitter" as a license plate, with the purse strap creating characters that resembled a plate number. This incident demonstrates how AI systems fail in unexpected ways when real-world conditions create edge cases developers never anticipated during testing. 💼 SPONSORS [{"name": "Prediction Guard", "url": "https://predictionguard.com"}] 🏷️ AI Safety, Model Auditing, AI Benchmarks, Red Teaming, AI Incidents

Explore More

Never miss Sean Macgregor's insights

Subscribe to get AI-powered summaries of Sean Macgregor's podcast appearances delivered to your inbox weekly.

Start Free Today

No credit card required • Free tier available