Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769
The TWIML AI PodcastAI Summary
→ WHAT IT COVERS Alex Bowcut, head of engineering at Sphere, explains how the company built TRAM, an AI system for sales tax compliance across global jurisdictions. The system combines semantic chunking, hybrid dense-sparse retrieval, and reinforcement fine-tuning to help tax experts work nearly two orders of magnitude faster than traditional manual methods. → KEY INSIGHTS - **Hybrid Retrieval Architecture:** Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval. When sparse search was reintroduced after starting with dense alone, Sphere saw a clear accuracy increase on retrieval evals, particularly for jurisdiction-specific legal terminology that semantic embeddings alone failed to surface reliably. - **Semantic Chunking Over Naive Splitting:** Splitting legal documents by character count discards critical hierarchical context. Build document-type-specific parsers — separate ones for statutes, case law, and department bulletins — that cut at natural legal section boundaries while preserving parent-child hierarchy metadata. This hierarchy enables downstream passage expansion and accurate citation reconstruction, directly reducing determination errors. - **Iterative Context Expansion Loop:** After initial retrieval, run a multi-step loop: re-rank passages with an LLM judge, expand each chunk using stored hierarchy to include adjacent sections, then have an LLM evaluate whether the added context remains on-scope. Repeat until a confidence threshold is met. This process added measurable accuracy gains beyond standard single-pass RAG retrieval. - **Reinforcement Fine-Tuning with Human Feedback Signal:** When tax experts correct a model determination, they leave structured explanations as feedback — essentially coaching a junior colleague. Pairing these corrections (ground truth answers plus expert reasoning) with known hard failures creates a high-signal RFT dataset. Sphere used this with OpenAI's alpha RFT program and deployed the resulting model to production with documented accuracy improvements. - **RAG Remains Necessary for Citation-Sensitive Domains:** Despite expanding context windows, agentic file-system search still misses relevant documents at a rate unacceptable for legal or compliance use cases. For domains requiring verifiable, source-linked citations — where errors carry legal or financial consequences — a purpose-built retrieval pipeline with controlled chunking and re-ranking outperforms general-purpose agent search as of current model capabilities. → NOTABLE MOMENT Bowcut noted that LLM cost sensitivity is essentially irrelevant at Sphere because even the most expensive frontier models are dramatically cheaper than human tax lawyers — and since the AI output feeds a deterministic engine rather than serving live inference, latency pressure is also absent, enabling accuracy-first engineering decisions. 💼 SPONSORS None detected 🏷️ Retrieval Augmented Generation, Tax Compliance AI, Reinforcement Fine-Tuning, Legal Document Processing, Enterprise AI Systems