Sovereign AI in Poland: Language Adaptation, Local Control & Cost Advantages with Marek Kozlowski
Episode
89 min
Read time
3 min
Topics
Artificial Intelligence
AI-Generated Summary
Key Takeaways
- ✓Language Adaptation vs. Full Pretraining: Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.
- ✓Frontier Model Quality Degrades for Niche Languages Over Generations: Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.
- ✓Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks: When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.
- ✓Domain Adaptation Requires ~10 Billion Clean Tokens to Be Worthwhile: PLUM's work with Central Eastern Europe's largest bank demonstrates that domain-specific continued pretraining delivers measurable quality gains — but only when the organization can supply roughly 10 billion tokens post-deduplication and filtering. Since raw data shrinks by 3–4x through curation, organizations need 30–40 billion raw tokens, a threshold fewer than 100 European companies realistically meet.
- ✓EU Regulation Eliminates ~80% of Usable Training Data: The EU AI Act combined with local authorship rights legislation creates constraints far stricter than any voluntary model constitution. These regulations prevent large-scale web scraping and require detailed model cards disclosing training data, compute, and security measures. PLUM compensates by securing bilateral agreements with publishers and libraries, and by building internal human annotation pipelines producing organic instruction and preference datasets.
What It Covers
Marek Kozlowski, head of Poland's National Information Processing Institute AI Lab, explains how Project PLUM (Polish Large Language Models) builds locally controlled AI by performing language adaptation on Llama and Mistral base models using ~200 billion curated Polish tokens, targeting performance parity with models 10x larger for language and cultural tasks.
Key Questions Answered
- •Language Adaptation vs. Full Pretraining: Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.
- •Frontier Model Quality Degrades for Niche Languages Over Generations: Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.
- •Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks: When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.
- •Domain Adaptation Requires ~10 Billion Clean Tokens to Be Worthwhile: PLUM's work with Central Eastern Europe's largest bank demonstrates that domain-specific continued pretraining delivers measurable quality gains — but only when the organization can supply roughly 10 billion tokens post-deduplication and filtering. Since raw data shrinks by 3–4x through curation, organizations need 30–40 billion raw tokens, a threshold fewer than 100 European companies realistically meet.
- •EU Regulation Eliminates ~80% of Usable Training Data: The EU AI Act combined with local authorship rights legislation creates constraints far stricter than any voluntary model constitution. These regulations prevent large-scale web scraping and require detailed model cards disclosing training data, compute, and security measures. PLUM compensates by securing bilateral agreements with publishers and libraries, and by building internal human annotation pipelines producing organic instruction and preference datasets.
- •Organic Human-Annotated Data Drives Quality at the SFT Stage: Synthetically generated instruction data from other LLMs degrades model output quality when those synthetic examples contain poor linguistic structure. PLUM employs dozens to hundreds of human annotators to create and review instructions and preference pairs manually. This organic data pipeline — combined with publishing dataset samples and a ~100-page technical cookbook on Hugging Face — differentiates PLUM from open-weight-only releases that share no training data transparency.
Notable Moment
Kozlowski reveals that when his team analyzed successive Claude model releases against the PLCC benchmark, Polish cultural and linguistic performance measurably declined across versions. This means organizations that deeply integrate a cloud LLM into Polish-language workflows could find their vendor's next release quietly performs worse on their core use case with no rollback option available.
You just read a 3-minute summary of a 86-minute episode.
Get Cognitive Revolution summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Cognitive Revolution
Inside Nathan's Second Brain: Daniel Miessler, Security Expert & Creator of PAI, Audits My AI Setup
May 30 · 152 min
Pivot
Anthropic's IPO, Platner's Campaign Controversies, and Blue Origin's Setback
Jun 2
More from Cognitive Revolution
Your Biggest Lever: Designing your AI Career for Maximum Impact, with 80,000 Hours founder Ben Todd
May 26 · 102 min
Software Engineering Daily
The Hardware Bottleneck AI Can’t Fix
Jun 2
More from Cognitive Revolution
We summarize every new episode. Want them in your inbox?
Inside Nathan's Second Brain: Daniel Miessler, Security Expert & Creator of PAI, Audits My AI Setup
Your Biggest Lever: Designing your AI Career for Maximum Impact, with 80,000 Hours founder Ben Todd
All Compute Is Food: Palisade's Jeffrey Ladish on AI Shutdown Resistance, Self-Replication & Ecology
The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More
Three Kinds of Software Survive: Tasklet's Andrew Lee on Competing to be a Horizontal Platform
Similar Episodes
Related episodes from other podcasts
Pivot
Jun 2
Anthropic's IPO, Platner's Campaign Controversies, and Blue Origin's Setback
Software Engineering Daily
Jun 2
The Hardware Bottleneck AI Can’t Fix
Masters of Scale
Jun 2
The race no one can win: AI’s anti-human crisis, with Aza Raskin
Marketplace
Jun 1
What's sector growth without job growth?
This Week in Startups
Jun 1
This Startup Fused Human Brain Cells with Silicon Chips | E2295
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Cognitive Revolution.
Every Monday, we deliver AI summaries of the latest episodes from Cognitive Revolution and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime