Small AI Models with Yoeven Khemlani
Episode
40 min
Read time
2 min
Topics
Design & UX, Artificial Intelligence, Software Development
AI-Generated Summary
Key Takeaways
- ✓Small model strategy: Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.
- ✓Prompt engine architecture: Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.
- ✓GPU-poor methodology: Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.
- ✓Developer experience principle: Design SDK with complete TypeScript typing so developers never need documentation for basic usage. NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.
What It Covers
Yoeven Khemlani explains how Jigsawstack builds specialized small AI models (70B parameters) for backend automation tasks like web scraping, OCR, and translation, achieving 98% accuracy while remaining deployable and cost-efficient at $1.40 per million tokens.
Key Questions Answered
- •Small model strategy: Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.
- •Prompt engine architecture: Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.
- •GPU-poor methodology: Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.
- •Developer experience principle: Design SDK with complete TypeScript typing so developers never need documentation for basic usage. NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.
Notable Moment
Jigsawstack benchmarked their OCR model against Mistral's self-proclaimed world's best OCR and found significant performance gaps, demonstrating that specialized small models from focused startups can outperform rushed releases from well-resourced companies attempting to enter adjacent markets.
You just read a 3-minute summary of a 37-minute episode.
Get Software Engineering Daily summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Software Engineering Daily
Developing Multiplayer Games in Godot
Jun 11 · 46 min
No Priors: Artificial Intelligence | Technology | Startups
Building an AI Guardian for Enterprise with Onyx Security CEO Maxim Bar Kogan
May 28
More from Software Engineering Daily
SED News: Apple’s AI Problem, The Real Business Model of AI, and Token Cost Reckoning
Jun 9 · 48 min
Eye on AI
#335 Sriram Raghavan: Why IBM Is Betting Everything on Small AI Models
Apr 19
More from Software Engineering Daily
We summarize every new episode. Want them in your inbox?
Developing Multiplayer Games in Godot
SED News: Apple’s AI Problem, The Real Business Model of AI, and Token Cost Reckoning
Web Native Game Development
The Hardware Bottleneck AI Can’t Fix
Autonomous Drone Delivery at Scale
Similar Episodes
Related episodes from other podcasts
No Priors: Artificial Intelligence | Technology | Startups
May 28
Building an AI Guardian for Enterprise with Onyx Security CEO Maxim Bar Kogan
Eye on AI
Apr 19
#335 Sriram Raghavan: Why IBM Is Betting Everything on Small AI Models
No Priors: Artificial Intelligence | Technology | Startups
Apr 3
AI for Atoms: How Periodic Labs is Revolutionizing Materials Engineering with Co-Founder Liam Fedus
Invest Like the Best with Patrick O'Shaughnessy
Mar 31
Sergey Levine - Building LLMs for the Physical World - [Invest Like the Best, EP.465]
Practical AI
Mar 25
AI at the Edge is a different operating environment
Explore Related Topics
This podcast is featured in Best Cybersecurity Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Software Engineering Daily.
Every Monday, we deliver AI summaries of the latest episodes from Software Engineering Daily and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime