Skip to main content
NP

Nora Petrova

1episode
1podcast

We have 1 summarized appearance for Nora Petrova so far. Browse all podcasts to discover more episodes.

Featured On 1 Podcast

All Appearances

1 episode

AI Summary

→ WHAT IT COVERS Andrew Gordon and Nora Petrova from Prolific explain why current AI benchmarks miss critical user experience factors and introduce their human-centered evaluation methodology called Humane. → KEY INSIGHTS - **TrueSkill Methodology:** Prolific uses Microsoft's TrueSkill framework from Xbox Live to run AI model tournaments, selecting model pairs based on information gain to minimize uncertainty efficiently with fewer comparisons needed. - **Representative Sampling:** Humane stratifies participants by age, ethnicity, and political alignment using census data from US and UK populations, unlike Chatbot Arena's anonymous users, enabling demographically representative preference data. - **Actionable Metrics:** Breaking preference into six specific dimensions—helpfulness, communication, adaptiveness, personality, trust, and cultural understanding—provides AI labs concrete feedback on where models need improvement versus single preference votes. → NOTABLE MOMENT Initial testing with 500 participants revealed models scored significantly lower on personality and cultural understanding metrics compared to helpfulness, suggesting training data may not produce personalities users actually want. 💼 SPONSORS [{"name": "Prolific", "url": ""}] 🏷️ AI Benchmarking, Human Evaluation, AI Safety

Explore More

Never miss Nora Petrova's insights

Subscribe to get AI-powered summaries of Nora Petrova's podcast appearances delivered to your inbox weekly.

Start Free Today

No credit card required • Free tier available