Skip to main content
The TWIML AI Podcast

Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769

51 min episode · 2 min read
·
Alex Bowcut

Episode

51 min

Read time

2 min

Topics

Sales & Revenue, Artificial Intelligence, Software Development

AI-Generated Summary

Key Takeaways

  • Hybrid Retrieval Architecture: Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval. When sparse search was reintroduced after starting with dense alone, Sphere saw a clear accuracy increase on retrieval evals, particularly for jurisdiction-specific legal terminology that semantic embeddings alone failed to surface reliably.
  • Semantic Chunking Over Naive Splitting: Splitting legal documents by character count discards critical hierarchical context. Build document-type-specific parsers — separate ones for statutes, case law, and department bulletins — that cut at natural legal section boundaries while preserving parent-child hierarchy metadata. This hierarchy enables downstream passage expansion and accurate citation reconstruction, directly reducing determination errors.
  • Iterative Context Expansion Loop: After initial retrieval, run a multi-step loop: re-rank passages with an LLM judge, expand each chunk using stored hierarchy to include adjacent sections, then have an LLM evaluate whether the added context remains on-scope. Repeat until a confidence threshold is met. This process added measurable accuracy gains beyond standard single-pass RAG retrieval.
  • Reinforcement Fine-Tuning with Human Feedback Signal: When tax experts correct a model determination, they leave structured explanations as feedback — essentially coaching a junior colleague. Pairing these corrections (ground truth answers plus expert reasoning) with known hard failures creates a high-signal RFT dataset. Sphere used this with OpenAI's alpha RFT program and deployed the resulting model to production with documented accuracy improvements.
  • RAG Remains Necessary for Citation-Sensitive Domains: Despite expanding context windows, agentic file-system search still misses relevant documents at a rate unacceptable for legal or compliance use cases. For domains requiring verifiable, source-linked citations — where errors carry legal or financial consequences — a purpose-built retrieval pipeline with controlled chunking and re-ranking outperforms general-purpose agent search as of current model capabilities.

What It Covers

Alex Bowcut, head of engineering at Sphere, explains how the company built TRAM, an AI system for sales tax compliance across global jurisdictions. The system combines semantic chunking, hybrid dense-sparse retrieval, and reinforcement fine-tuning to help tax experts work nearly two orders of magnitude faster than traditional manual methods.

Key Questions Answered

  • Hybrid Retrieval Architecture: Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval. When sparse search was reintroduced after starting with dense alone, Sphere saw a clear accuracy increase on retrieval evals, particularly for jurisdiction-specific legal terminology that semantic embeddings alone failed to surface reliably.
  • Semantic Chunking Over Naive Splitting: Splitting legal documents by character count discards critical hierarchical context. Build document-type-specific parsers — separate ones for statutes, case law, and department bulletins — that cut at natural legal section boundaries while preserving parent-child hierarchy metadata. This hierarchy enables downstream passage expansion and accurate citation reconstruction, directly reducing determination errors.
  • Iterative Context Expansion Loop: After initial retrieval, run a multi-step loop: re-rank passages with an LLM judge, expand each chunk using stored hierarchy to include adjacent sections, then have an LLM evaluate whether the added context remains on-scope. Repeat until a confidence threshold is met. This process added measurable accuracy gains beyond standard single-pass RAG retrieval.
  • Reinforcement Fine-Tuning with Human Feedback Signal: When tax experts correct a model determination, they leave structured explanations as feedback — essentially coaching a junior colleague. Pairing these corrections (ground truth answers plus expert reasoning) with known hard failures creates a high-signal RFT dataset. Sphere used this with OpenAI's alpha RFT program and deployed the resulting model to production with documented accuracy improvements.
  • RAG Remains Necessary for Citation-Sensitive Domains: Despite expanding context windows, agentic file-system search still misses relevant documents at a rate unacceptable for legal or compliance use cases. For domains requiring verifiable, source-linked citations — where errors carry legal or financial consequences — a purpose-built retrieval pipeline with controlled chunking and re-ranking outperforms general-purpose agent search as of current model capabilities.

Notable Moment

Bowcut noted that LLM cost sensitivity is essentially irrelevant at Sphere because even the most expensive frontier models are dramatically cheaper than human tax lawyers — and since the AI output feeds a deterministic engine rather than serving live inference, latency pressure is also absent, enabling accuracy-first engineering decisions.

Know someone who'd find this useful?

You just read a 3-minute summary of a 48-minute episode.

Get The TWIML AI Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

  • by Pinecone

    Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval.
  • by OpenAI

    Combining dense semantic embeddings (OpenAI models via Pinecone) with sparse TF-IDF-style full-text search measurably improves citation accuracy over dense-only retrieval.

company

  • by Sphere

    Alex Bowcut, head of engineering at Sphere, explains how the company built TRAM, an AI system for sales tax compliance across global jurisdictions.

More from The TWIML AI Podcast

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into The TWIML AI Podcast.

Every Monday, we deliver AI summaries of the latest episodes from The TWIML AI Podcast and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime