What are the key takeaways from this Cognitive Revolution episode?

Key insights include: **Language Adaptation vs. Full Pretraining:** Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.; **Frontier Model Quality Degrades for Niche Languages Over Generations:** Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.; **Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks:** When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.

What did Marek Kozlowski discuss on Cognitive Revolution?

Marek Kozlowski, head of Poland's National Information Processing Institute AI Lab, explains how Project PLUM (Polish Large Language Models) builds locally controlled AI by performing language adaptation on Llama and Mistral base models using ~200 billion curated Polish tokens, targeting performance parity with models 10x larger for language and cultural tasks. Key topics include: **Language Adaptation vs. Full Pretraining:** Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.; **Frontier Model Quality Degrades for Niche Languages Over Generations:** Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse..

How long is this episode of Cognitive Revolution?

This episode is 89 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Cognitive Revolution

Sovereign AI in Poland: Language Adaptation, Local Control & Cost Advantages with Marek Kozlowski

December 6, 2025

89 min episode · 3 min read

Marek Kozlowski

Episode

89 min

Read time

3 min

Topics

Fundraising & VC, Artificial Intelligence, Software Development

AI-Generated Summary

Published Jun 2, 2026

Key Takeaways

✓Language Adaptation vs. Full Pretraining: Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.
✓Frontier Model Quality Degrades for Niche Languages Over Generations: Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.
✓Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks: When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.
✓Domain Adaptation Requires ~10 Billion Clean Tokens to Be Worthwhile: PLUM's work with Central Eastern Europe's largest bank demonstrates that domain-specific continued pretraining delivers measurable quality gains — but only when the organization can supply roughly 10 billion tokens post-deduplication and filtering. Since raw data shrinks by 3–4x through curation, organizations need 30–40 billion raw tokens, a threshold fewer than 100 European companies realistically meet.
✓EU Regulation Eliminates ~80% of Usable Training Data: The EU AI Act combined with local authorship rights legislation creates constraints far stricter than any voluntary model constitution. These regulations prevent large-scale web scraping and require detailed model cards disclosing training data, compute, and security measures. PLUM compensates by securing bilateral agreements with publishers and libraries, and by building internal human annotation pipelines producing organic instruction and preference datasets.

What It Covers

Marek Kozlowski, head of Poland's National Information Processing Institute AI Lab, explains how Project PLUM (Polish Large Language Models) builds locally controlled AI by performing language adaptation on Llama and Mistral base models using ~200 billion curated Polish tokens, targeting performance parity with models 10x larger for language and cultural tasks.

Key Questions Answered

•Language Adaptation vs. Full Pretraining: Rather than training from scratch — which requires at least 1 trillion tokens for stable results — PLUM continues pretraining Llama and Mistral base models on ~200 billion curated Polish tokens. This "language adaptation" injects local linguistic and cultural knowledge while preserving existing multilingual capabilities, achieving competitive Polish-language performance without the compute cost of full pretraining runs.
•Frontier Model Quality Degrades for Niche Languages Over Generations: Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance. As frontier labs prioritize coding and reasoning benchmarks, niche language quality becomes a trade-off casualty — meaning organizations relying on cloud APIs risk worsening performance over time without any warning or recourse.
•Small Fine-Tuned Models Match Large Cloud Models for Specific Tasks: When a business has 10–20 defined use cases and prepares at least 1,000 supervised fine-tuning instructions per task, a smaller on-premise model matches zero-shot or few-shot performance from large cloud LLMs. This approach reduces energy costs, eliminates cloud dependency, and enables deployment in regulated sectors where data cannot leave the organization's infrastructure.
•Domain Adaptation Requires ~10 Billion Clean Tokens to Be Worthwhile: PLUM's work with Central Eastern Europe's largest bank demonstrates that domain-specific continued pretraining delivers measurable quality gains — but only when the organization can supply roughly 10 billion tokens post-deduplication and filtering. Since raw data shrinks by 3–4x through curation, organizations need 30–40 billion raw tokens, a threshold fewer than 100 European companies realistically meet.
•EU Regulation Eliminates ~80% of Usable Training Data: The EU AI Act combined with local authorship rights legislation creates constraints far stricter than any voluntary model constitution. These regulations prevent large-scale web scraping and require detailed model cards disclosing training data, compute, and security measures. PLUM compensates by securing bilateral agreements with publishers and libraries, and by building internal human annotation pipelines producing organic instruction and preference datasets.
•Organic Human-Annotated Data Drives Quality at the SFT Stage: Synthetically generated instruction data from other LLMs degrades model output quality when those synthetic examples contain poor linguistic structure. PLUM employs dozens to hundreds of human annotators to create and review instructions and preference pairs manually. This organic data pipeline — combined with publishing dataset samples and a ~100-page technical cookbook on Hugging Face — differentiates PLUM from open-weight-only releases that share no training data transparency.

Notable Moment

Kozlowski reveals that when his team analyzed successive Claude model releases against the PLCC benchmark, Polish cultural and linguistic performance measurably declined across versions. This means organizations that deeply integrate a cloud LLM into Polish-language workflows could find their vendor's next release quietly performs worse on their core use case with no rollback option available.

Know someone who'd find this useful?

You just read a 3-minute summary of a 86-minute episode.

Get Cognitive Revolution summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Books, tools, and gear mentioned in this episode

SignalCast may earn commission on purchases via these links.

Tools

Hugging Face
by Hugging Face
“This organic data pipeline — combined with publishing dataset samples and a ~100-page technical cookbook on Hugging Face — differentiates PLUM from open-weight-only releases”
PLCC (Polish Holistic Cultural Competency benchmark)By guest
by National Information Processing Institute
“Benchmarking on Poland's PLCC (Polish Holistic Cultural Competency) benchmark reveals that successive Claude and GPT model releases show declining Polish language and cultural performance.”
Llama
by Meta
“PLUM (Polish Large Language Models) builds locally controlled AI by performing language adaptation on Llama and Mistral base models using ~200 billion curated Polish tokens”
Mistral
by Mistral AI
“PLUM (Polish Large Language Models) builds locally controlled AI by performing language adaptation on Llama and Mistral base models using ~200 billion curated Polish tokens”

Similar Episodes

Related episodes from other podcasts

Everything Everywhere Daily

Jun 6

Explore Related Topics

💰Fundraising & VC 🤖Artificial Intelligence 💻Software Development

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Cognitive Revolution.

Every Monday, we deliver AI summaries of the latest episodes from Cognitive Revolution and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Sovereign AI in Poland: Language Adaptation, Local Control & Cost Advantages with Marek Kozlowski

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

The Lincoln-Douglas Debates

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

Why Susquehanna Is Building a Prediction Markets Business

Books, tools, and gear mentioned in this episode

Tools

More from Cognitive Revolution

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering

AI:AM #4: Cameron on Model Consciousness, Duvenaud's Gradual Disempowerment, swyx's AI-Eng Alpha

Similar Episodes

The Lincoln-Douglas Debates

Why Susquehanna Is Building a Prediction Markets Business

Inside Trump’s Mad Dash to Renovate Washington

Why half of product managers are in trouble | Nikhyl Singhal (Meta, Google)

#334 Abhishek Singh: The $1.2 Billion Plan to Turn India Into an AI Superpower

Explore Related Topics

You're clearly into Cognitive Revolution.