Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
Episode
51 min
Read time
2 min
Topics
Sales & Revenue, Artificial Intelligence, Software Development
AI-Generated Summary
Key Takeaways
- ✓Hybrid Retrieval Architecture: Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval. When sparse search was reintroduced after starting with dense alone, Sphere saw a clear accuracy increase on retrieval evals, particularly for jurisdiction-specific legal terminology that semantic embeddings alone failed to surface reliably.
- ✓Semantic Chunking Over Naive Splitting: Splitting legal documents by character count discards critical hierarchical context. Build document-type-specific parsers — separate ones for statutes, case law, and department bulletins — that cut at natural legal section boundaries while preserving parent-child hierarchy metadata. This hierarchy enables downstream passage expansion and accurate citation reconstruction, directly reducing determination errors.
- ✓Iterative Context Expansion Loop: After initial retrieval, run a multi-step loop: re-rank passages with an LLM judge, expand each chunk using stored hierarchy to include adjacent sections, then have an LLM evaluate whether the added context remains on-scope. Repeat until a confidence threshold is met. This process added measurable accuracy gains beyond standard single-pass RAG retrieval.
- ✓Reinforcement Fine-Tuning with Human Feedback Signal: When tax experts correct a model determination, they leave structured explanations as feedback — essentially coaching a junior colleague. Pairing these corrections (ground truth answers plus expert reasoning) with known hard failures creates a high-signal RFT dataset. Sphere used this with OpenAI's alpha RFT program and deployed the resulting model to production with documented accuracy improvements.
- ✓RAG Remains Necessary for Citation-Sensitive Domains: Despite expanding context windows, agentic file-system search still misses relevant documents at a rate unacceptable for legal or compliance use cases. For domains requiring verifiable, source-linked citations — where errors carry legal or financial consequences — a purpose-built retrieval pipeline with controlled chunking and re-ranking outperforms general-purpose agent search as of current model capabilities.
What It Covers
Alex Bowcut, head of engineering at Sphere, explains how the company built TRAM, an AI system for sales tax compliance across global jurisdictions. The system combines semantic chunking, hybrid dense-sparse retrieval, and reinforcement fine-tuning to help tax experts work nearly two orders of magnitude faster than traditional manual methods.
Key Questions Answered
- •Hybrid Retrieval Architecture: Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval. When sparse search was reintroduced after starting with dense alone, Sphere saw a clear accuracy increase on retrieval evals, particularly for jurisdiction-specific legal terminology that semantic embeddings alone failed to surface reliably.
- •Semantic Chunking Over Naive Splitting: Splitting legal documents by character count discards critical hierarchical context. Build document-type-specific parsers — separate ones for statutes, case law, and department bulletins — that cut at natural legal section boundaries while preserving parent-child hierarchy metadata. This hierarchy enables downstream passage expansion and accurate citation reconstruction, directly reducing determination errors.
- •Iterative Context Expansion Loop: After initial retrieval, run a multi-step loop: re-rank passages with an LLM judge, expand each chunk using stored hierarchy to include adjacent sections, then have an LLM evaluate whether the added context remains on-scope. Repeat until a confidence threshold is met. This process added measurable accuracy gains beyond standard single-pass RAG retrieval.
- •Reinforcement Fine-Tuning with Human Feedback Signal: When tax experts correct a model determination, they leave structured explanations as feedback — essentially coaching a junior colleague. Pairing these corrections (ground truth answers plus expert reasoning) with known hard failures creates a high-signal RFT dataset. Sphere used this with OpenAI's alpha RFT program and deployed the resulting model to production with documented accuracy improvements.
- •RAG Remains Necessary for Citation-Sensitive Domains: Despite expanding context windows, agentic file-system search still misses relevant documents at a rate unacceptable for legal or compliance use cases. For domains requiring verifiable, source-linked citations — where errors carry legal or financial consequences — a purpose-built retrieval pipeline with controlled chunking and re-ranking outperforms general-purpose agent search as of current model capabilities.
Notable Moment
Bowcut noted that LLM cost sensitivity is essentially irrelevant at Sphere because even the most expensive frontier models are dramatically cheaper than human tax lawyers — and since the AI output feeds a deterministic engine rather than serving live inference, latency pressure is also absent, enabling accuracy-first engineering decisions.
You just read a 3-minute summary of a 48-minute episode.
Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from The TWIML AI Podcast
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
May 21 · 66 min
Odd Lots
Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip
May 21
More from The TWIML AI Podcast
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
May 7 · 53 min
Alt Goes Mainstream
AGM Unscripted: Goldman Sachs' Michael Bruun - Driving Value in Private Equity Through Network and Innovation
Feb 13
Books, tools, and gear mentioned in this episode
SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.
Tools
by Pinecone
“Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval.”
by OpenAI
“Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval.”
company
by Sphere
“Alex Bowcut, head of engineering at Sphere, explains how the company built TRAM, an AI system for sales tax compliance across global jurisdictions.”
More from The TWIML AI Podcast
We summarize every new episode. Want them in your inbox?
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
How to Engineer AI Inference Systems with Philip Kiely - #766
How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765
The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764
Similar Episodes
Related episodes from other podcasts
Odd Lots
May 21
Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip
Alt Goes Mainstream
Feb 13
AGM Unscripted: Goldman Sachs' Michael Bruun - Driving Value in Private Equity Through Network and Innovation
Invest Like the Best with Patrick O'Shaughnessy
Jun 9
Alex Sacerdote - How to Invest Through Technology Cycles - [Invest Like the Best, EP.477]
Invest Like the Best with Patrick O'Shaughnessy
Jun 3
Dara Khosrowshahi - Uber's Bet on AVs, AI, and Building a Super-App - [Invest Like the Best, EP.476]
a16z Podcast
May 28
Stablecoins, AI Agents, and The Future of Global Banking
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into The TWIML AI Podcast.
Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime