Proactive Agents for the Web with Devi Parikh - #756
Episode
56 min
Read time
2 min
Topics
Startups, Leadership, Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Visual-based browser navigation: Training models on website screenshots rather than DOM information proves more reliable and generalizable across different sites, solving challenges like date pickers that plagued DOM-based approaches with constant edge cases requiring site-specific solutions.
- ✓Scouts architecture combines APIs and browser automation: The system uses 80-90 MCP servers for structured data access but spins up remote browsers with custom-trained navigator models for information behind forms, optimizing for coverage first then precision in user-facing reports.
- ✓Post-training progression maximizes model capability: Utori trains QwQ models through supervised fine-tuning, then rejection sampling, then reinforcement learning to achieve reliable browser automation while keeping costs lower than using third-party API providers for their production workloads.
- ✓Background agents require hierarchical tool management: When orchestrating 80-90 tools, reliability breaks down if all tools are available simultaneously. Sub-agents with access to specific tool subsets enable scalable multi-agent workflows that adapt based on real-time web information.
What It Covers
Devi Parikh, co-founder of Utori, explains how AI browser agents will replace manual web interactions through proactive monitoring and automation, starting with Scouts, their product that monitors websites for user-specified information changes.
Key Questions Answered
- •Visual-based browser navigation: Training models on website screenshots rather than DOM information proves more reliable and generalizable across different sites, solving challenges like date pickers that plagued DOM-based approaches with constant edge cases requiring site-specific solutions.
- •Scouts architecture combines APIs and browser automation: The system uses 80-90 MCP servers for structured data access but spins up remote browsers with custom-trained navigator models for information behind forms, optimizing for coverage first then precision in user-facing reports.
- •Post-training progression maximizes model capability: Utori trains QwQ models through supervised fine-tuning, then rejection sampling, then reinforcement learning to achieve reliable browser automation while keeping costs lower than using third-party API providers for their production workloads.
- •Background agents require hierarchical tool management: When orchestrating 80-90 tools, reliability breaks down if all tools are available simultaneously. Sub-agents with access to specific tool subsets enable scalable multi-agent workflows that adapt based on real-time web information.
Notable Moment
Parikh reveals that despite initial assumptions, consuming web pages visually like humans rather than parsing underlying code proved essential for building reliable browser agents, as identical-looking pages often have completely different underlying structures.
You just read a 3-minute summary of a 53-minute episode.
Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The TWIML AI Podcast
Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Jun 9 · 51 min
Eye on AI
Every Enterprise Is About to Have a 100,000 Agent Problem | Oren Michaels of Barndoor AI
Jun 6
More from The TWIML AI Podcast
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
May 21 · 66 min
Eye on AI
Loris Degioanni: Why AI Is Breaking Cybersecurity, and What Comes Next
May 6
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
“Utori trains QwQ models through supervised fine-tuning, then rejection sampling, then reinforcement learning to achieve reliable browser automation”
“The system uses 80-90 MCP servers for structured data access but spins up remote browsers with custom-trained navigator models”
Products
company
- UtoriBy guest
“Devi Parikh, co-founder of Utori, explains how AI browser agents will replace manual web interactions through proactive monitoring and automation”
More from The TWIML AI Podcast
We summarize every new episode. Want them in your inbox?
Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
How to Engineer AI Inference Systems with Philip Kiely - #766
How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765
Similar Episodes
Related episodes from other podcasts
Eye on AI
Jun 6
Every Enterprise Is About to Have a 100,000 Agent Problem | Oren Michaels of Barndoor AI
Eye on AI
May 6
Loris Degioanni: Why AI Is Breaking Cybersecurity, and What Comes Next
The Startup Ideas Podcast
Mar 24
What is Firecrawl?
Software Engineering Daily
Mar 5
Organizational Context for AI Coding Agents with Dennis Pilarinos
Revenue Vitals
Mar 4
Why It's Time to Bury the MQL – With Jon Miller, the Marketo Co-Founder Who Helped Popularize It
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Startups & Product Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into The TWIML AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime