Skip to main content
Cognitive Revolution

Sovereign AI in Poland: Language Adaptation, Local Control & Cost Advantages with Marek Kozlowski

89 min episode · 3 min read
·

Episode

89 min

Read time

3 min

Topics

Artificial Intelligence

AI-Generated Summary

Key Takeaways

  • Language Adaptation vs. Full Pretraining: Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.
  • Frontier Model Quality Degrades for Niche Languages Over Generations: Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.
  • Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks: When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.
  • Domain Adaptation Requires ~10 Billion Clean Tokens to Be Worthwhile: PLUM's work with Central Eastern Europe's largest bank demonstrates that domain-specific continued pretraining delivers measurable quality gains — but only when the organization can supply roughly 10 billion tokens post-deduplication and filtering. Since raw data shrinks by 3–4x through curation, organizations need 30–40 billion raw tokens, a threshold fewer than 100 European companies realistically meet.
  • EU Regulation Eliminates ~80% of Usable Training Data: The EU AI Act combined with local authorship rights legislation creates constraints far stricter than any voluntary model constitution. These regulations prevent large-scale web scraping and require detailed model cards disclosing training data, compute, and security measures. PLUM compensates by securing bilateral agreements with publishers and libraries, and by building internal human annotation pipelines producing organic instruction and preference datasets.

What It Covers

Marek Kozlowski, head of Poland's National Information Processing Institute AI Lab, explains how Project PLUM (Polish Large Language Models) builds locally controlled AI by performing language adaptation on Llama and Mistral base models using ~200 billion curated Polish tokens, targeting performance parity with models 10x larger for language and cultural tasks.

Key Questions Answered

  • Language Adaptation vs. Full Pretraining: Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.
  • Frontier Model Quality Degrades for Niche Languages Over Generations: Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.
  • Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks: When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.
  • Domain Adaptation Requires ~10 Billion Clean Tokens to Be Worthwhile: PLUM's work with Central Eastern Europe's largest bank demonstrates that domain-specific continued pretraining delivers measurable quality gains — but only when the organization can supply roughly 10 billion tokens post-deduplication and filtering. Since raw data shrinks by 3–4x through curation, organizations need 30–40 billion raw tokens, a threshold fewer than 100 European companies realistically meet.
  • EU Regulation Eliminates ~80% of Usable Training Data: The EU AI Act combined with local authorship rights legislation creates constraints far stricter than any voluntary model constitution. These regulations prevent large-scale web scraping and require detailed model cards disclosing training data, compute, and security measures. PLUM compensates by securing bilateral agreements with publishers and libraries, and by building internal human annotation pipelines producing organic instruction and preference datasets.
  • Organic Human-Annotated Data Drives Quality at the SFT Stage: Synthetically generated instruction data from other LLMs degrades model output quality when those synthetic examples contain poor linguistic structure. PLUM employs dozens to hundreds of human annotators to create and review instructions and preference pairs manually. This organic data pipeline — combined with publishing dataset samples and a ~100-page technical cookbook on Hugging Face — differentiates PLUM from open-weight-only releases that share no training data transparency.

Notable Moment

Kozlowski reveals that when his team analyzed successive Claude model releases against the PLCC benchmark, Polish cultural and linguistic performance measurably declined across versions. This means organizations that deeply integrate a cloud LLM into Polish-language workflows could find their vendor's next release quietly performs worse on their core use case with no rollback option available.

Know someone who'd find this useful?

You just read a 3-minute summary of a 86-minute episode.

Get Cognitive Revolution summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Cognitive Revolution

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Cognitive Revolution.

Every Monday, we deliver AI summaries of the latest episodes from Cognitive Revolution and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime