
#316 Robbie Goldfarb: Why the Future of AI Depends on Better Judgment
Eye on AIAI Summary
→ WHAT IT COVERS Robbie Goldfarb, founder of Form.ai, explains how his company scales expert judgment to evaluate and improve AI systems in contentious domains like healthcare and politics. Form.ai builds transparent networks of credible experts—including Fareed Zakaria and Neil Ferguson—then creates AI judges that capture their reasoning processes to assess models for bias, accuracy, and clinical nuance. → KEY INSIGHTS - **Expert thought process extraction:** Form.ai maps how experts reason through complex questions by asking them to verbalize their approach rather than just label data. For political questions, experts might say they would cross-reference reliable sources first, creating a chain of reasoning. Form.ai builds agentic systems that mirror these thought process graphs, testing generalizability across scenarios before deploying judges at scale. - **Consequence mapping methodology:** Instead of traditional good-or-bad labeling, Form.ai asks experts to predict outcomes of AI conversations—what emotions users would feel, what actions they would take, what family members would say. This experimental approach provides richer data for training judges and reveals the reasoning behind expert evaluations, particularly valuable for sensitive domains like mental health where clinical nuance matters. - **Four-tier evaluation framework for political content:** Form.ai assesses political AI responses across bias (which breaks into multiple subcategories), factuality, source selection (credibility, balance, accurate attribution), and tone-language (avoiding inflammatory phrasing that mirrors user anger). Different combinations of topic, user intent, and evaluation dimension require separate judges, creating a matrix of specialized evaluators rather than one-size-fits-all assessment. - **Trust gap blocking AI adoption:** KPMG research shows 80-plus percent of people feel optimistic about AI improving their lives, yet only 40-something percent trust it. This delta represents the critical barrier to realizing AI potential. Form.ai addresses this through transparent expert networks published on their website, allowing users to see exactly who shaped model training rather than relying on internal engineering teams or scaled labelers. - **Dangerous expectation of omniscient AI:** ChatGPT established a problematic norm where users expect authoritative answers to any question without context-gathering dialogue. Real expertise requires back-and-forth—doctors do not prescribe after one statement. AI models need awareness of when they lack sufficient context to respond responsibly, but this conflicts with engagement metrics since users migrate to models providing immediate answers over those asking clarifying questions. - **Mental health scale reveals urgent need:** OpenAI reported over one million weekly conversations where users demonstrated suicidal intent, illustrating the massive scale at which people turn to AI for mental health support. Current models lack clinical nuance—the National Eating Disorders Association shut down their Tessa chatbot after it produced adverse effects on users. Form.ai partners with Cleveland Clinic and Mount Sinai to embed medical expertise into health-related AI evaluations. → NOTABLE MOMENT Goldfarb describes how a podcaster insisted GPT-3 proved mind-body separation based on controversial Arizona experiments, demonstrating how question phrasing biases LLM responses. The same model gave opposite answers depending on how the question was framed, revealing that probability distributions masquerading as truth engines create dangerous sycophancy problems when users treat outputs as authoritative rather than probabilistic. 💼 SPONSORS None detected 🏷️ AI Evaluation, Expert Networks, AI Trust & Safety, Model Bias Detection, Clinical AI, Political Misinformation