Small AI Models with Yoeven Khemlani
Episode
40 min
Read time
2 min
Topics
Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Small model strategy: Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.
- ✓Prompt engine architecture: Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.
- ✓GPU-poor methodology: Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.
- ✓Developer experience principle: Design SDK with complete TypeScript typing so developers never need documentation for basic usage. NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.
What It Covers
Yoeven Khemlani explains how Jigsawstack builds specialized small AI models (70B parameters) for backend automation tasks like web scraping, OCR, and translation, achieving 98% accuracy while remaining deployable and cost-efficient at $1.40 per million tokens.
Key Questions Answered
- •Small model strategy: Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.
- •Prompt engine architecture: Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.
- •GPU-poor methodology: Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.
- •Developer experience principle: Design SDK with complete TypeScript typing so developers never need documentation for basic usage. NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.
Notable Moment
Jigsawstack benchmarked their OCR model against Mistral's self-proclaimed world's best OCR and found significant performance gaps, demonstrating that specialized small models from focused startups can outperform rushed releases from well-resourced companies attempting to enter adjacent markets.
You just read a 3-minute summary of a 37-minute episode.
Get Software Engineering Daily summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Software Engineering Daily
Open-Weight AI Models
Apr 28 · 50 min
Morning Brew Daily
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
Apr 30
More from Software Engineering Daily
Hype and Reality of the AI Coding Shift
Apr 23 · 59 min
a16z Podcast
Workday’s Last Workday? AI and the Future of Enterprise Software
Apr 30
More from Software Engineering Daily
We summarize every new episode. Want them in your inbox?
Similar Episodes
Related episodes from other podcasts
Morning Brew Daily
Apr 30
Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?
a16z Podcast
Apr 30
Workday’s Last Workday? AI and the Future of Enterprise Software
Masters of Scale
Apr 30
How Poppi’s founders built a new soda brand worth $2 billion
Snacks Daily
Apr 30
🦸♀️ “MAMA Stocks” — Zuck’s Ad/AI machine. Hilary Duff’s anti-Ozempic bet. Bill Ackman’s Influencer IPO. +Refresher surge
The Mel Robbins Podcast
Apr 30
Eat This to Live Longer, Stay Young, and Transform Your Health
Explore Related Topics
This podcast is featured in Best Cybersecurity Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Software Engineering Daily.
Every Monday, we deliver AI summaries of the latest episodes from Software Engineering Daily and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime