What are the key takeaways from this Software Engineering Daily episode?

Key insights include: **Small model strategy:** Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.; **Prompt engine architecture:** Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.; **GPU-poor methodology:** Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.

What did Yoeven Khemlani discuss on Software Engineering Daily?

Yoeven Khemlani explains how Jigsawstack builds specialized small AI models (70B parameters) for backend automation tasks like web scraping, OCR, and translation, achieving 98% accuracy while remaining deployable and cost-efficient at $1.40 per million tokens. Key topics include: **Small model strategy:** Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.; **Prompt engine architecture:** Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus..

How long is this episode of Software Engineering Daily?

This episode is 40 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Software Engineering Daily

Small AI Models with Yoeven Khemlani

July 24, 2025

40 min episode · 2 min read

Yoeven Khemlani

Episode

40 min

Read time

2 min

Topics

Design & UX, Artificial Intelligence, Software Development

AI-Generated Summary

Published Dec 25, 2025

Key Takeaways

✓Small model strategy: Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.
✓Prompt engine architecture: Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.
✓GPU-poor methodology: Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.
✓Developer experience principle: Design SDK with complete TypeScript typing so developers never need documentation for basic usage. NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.

What It Covers

Yoeven Khemlani explains how Jigsawstack builds specialized small AI models (70B parameters) for backend automation tasks like web scraping, OCR, and translation, achieving 98% accuracy while remaining deployable and cost-efficient at $1.40 per million tokens.

Key Questions Answered

•Small model strategy: Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.
•Prompt engine architecture: Routes prompts across five models simultaneously, uses mixture of agents technique where smaller models judge outputs, then converges on best answer. By tenth execution, system locks to single optimal model, ensuring consistency while initially guaranteeing quality through consensus.
•GPU-poor methodology: Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance. This distribution strategy enables customers to deploy on AWS, Azure, GCP, or on-premise infrastructure without restrictions.
•Developer experience principle: Design SDK with complete TypeScript typing so developers never need documentation for basic usage. NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.

Notable Moment

Jigsawstack benchmarked their OCR model against Mistral's self-proclaimed world's best OCR and found significant performance gaps, demonstrating that specialized small models from focused startups can outperform rushed releases from well-resourced companies attempting to enter adjacent markets.

Know someone who'd find this useful?

You just read a 3-minute summary of a 37-minute episode.

Get Software Engineering Daily summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

The Startup Scene in Southeast Asia

Jul 28 · 43 min

No Priors: Artificial Intelligence | Technology | Startups

Building an AI Guardian for Enterprise with Onyx Security CEO Maxim Bar Kogan

May 28

NanoClaw and the Rise of Personal AI Agents

Jul 21 · 63 min

Cognitive Revolution

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

Jul 4

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links. As an Amazon Associate, SignalCast earns from qualifying purchases.

Tools

Mistral
by Mistral AI
“Jigsawstack benchmarked their OCR model against Mistral's self-proclaimed world's best OCR and found significant performance gaps, demonstrating that specialized small models from focused startups can outperform rushed releases from well-resourced companies attempting to enter adjacent markets.”
Stripe
by Stripe
“NPM install provides intuitive API structure modeled after Stripe, where method names and parameters are self-explanatory, reserving docs only for advanced configurations.”

Gear

A100 GPU
by NVIDIA
“Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.”
Amazon
H100 GPU
by NVIDIA
“Train 70B parameter models instead of 400B by specializing for single use cases, enabling deployment on A100 GPUs rather than requiring H100s, reducing infrastructure costs while maintaining 97-98% accuracy for specific tasks like structured web scraping.”
Amazon
A10G GPU
by NVIDIA
“Build all models to run on accessible hardware (A100, A10G) rather than premium GPUs, prioritizing deployability for enterprise self-hosting over raw performance.”
Amazon

company

Jigsawstack
“Yoeven Khemlani explains how Jigsawstack builds specialized small AI models (70B parameters) for backend automation tasks like web scraping, OCR, and translation, achieving 98% accuracy while remaining deployable and cost-efficient at $1.40 per million tokens.”

Similar Episodes

Related episodes from other podcasts

No Priors: Artificial Intelligence | Technology | Startups

May 28

Explore Related Topics

🎨Design & UX 🤖Artificial Intelligence 💻Software Development

This podcast is featured in Best Cybersecurity Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Software Engineering Daily.

Every Monday, we deliver AI summaries of the latest episodes from Software Engineering Daily and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Small AI Models with Yoeven Khemlani

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

The Startup Scene in Southeast Asia

Building an AI Guardian for Enterprise with Onyx Security CEO Maxim Bar Kogan

NanoClaw and the Rise of Personal AI Agents

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

Books, tools, and gear mentioned in this episode

Tools

Gear

company

More from Software Engineering Daily

The Startup Scene in Southeast Asia

NanoClaw and the Rise of Personal AI Agents

Agentic DevOps at AWS

AURA and Open-Source Agents for Production Operations

Eric Ries on Why Good Companies Go Bad

Similar Episodes

Building an AI Guardian for Enterprise with Onyx Security CEO Maxim Bar Kogan

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

#872: Graham Duncan — Talent Is the Best Asset Class (Repost)

#335 Sriram Raghavan: Why IBM Is Betting Everything on Small AI Models

AI for Atoms: How Periodic Labs is Revolutionizing Materials Engineering with Co-Founder Liam Fedus

Explore Related Topics

You're clearly into Software Engineering Daily.