Skip to main content
Eye on AI

Your Child's Data Profile Starts Before They're Born | Eamonn Maguire of Proton

55 min episode · 2 min read
·

Episode

55 min

Read time

2 min

Topics

Science & Discovery

AI-Generated Summary

Key Takeaways

  • Pre-birth data profiling: The moment a parent emails a gynecologist or fertility clinic using Gmail or Outlook, advertising platforms flag that household as expecting and begin building a child's profile before birth. Switching to end-to-end encrypted email like ProtonMail at the start of a pregnancy prevents this data from entering ad-targeting systems entirely.
  • AI training data opacity: Only 0.3% of GPT-2's training data came from the entire English-language Wikipedia. The remainder was scraped web pages, social media, and unattributed sources. Anthropic faced a $1.5 billion lawsuit for scanning thousands of purchased books then discarding them to eliminate copyright paper trails — a pattern users should factor into trust decisions.
  • Profile inference from minimal data: Three email sign-ups — Instagram, a political newsletter, and an AI publication — are sufficient for platforms to infer age, ideology, and interests, then expand the profile by serving targeted ads and measuring click behavior. Non-clicks on religious or political content are themselves used to fill profile gaps.
  • Open vs. open-washed AI models: Proton's Lumo assistant deploys genuinely open models — including GLM 5.1, Qwen 3.5, and NVIDIA's Nematron series — where training data, code, and architecture are all publicly verifiable. Models labeled open-source but with undisclosed training data, such as Meta's Llama, are described as "open-washing" and carry the same trust risks as proprietary systems.
  • Privacy-preserving AI within encrypted environments: Proton implements local indexing of Drive folders linked to Lumo projects, enabling retrieval-augmented generation without sending documents to external servers. Users can disable web search APIs entirely if their threat model requires it, and all chat history is end-to-end encrypted with user-held keys, making server-side data access structurally impossible.

What It Covers

Eamonn Maguire of Proton explains how data profiling begins before a child is born, how AI models are trained on scraped data without consent, and how Proton's ecosystem — including Lumo AI, encrypted email, and the Born Private initiative — offers a structural alternative to surveillance-based platforms.

Key Questions Answered

  • Pre-birth data profiling: The moment a parent emails a gynecologist or fertility clinic using Gmail or Outlook, advertising platforms flag that household as expecting and begin building a child's profile before birth. Switching to end-to-end encrypted email like ProtonMail at the start of a pregnancy prevents this data from entering ad-targeting systems entirely.
  • AI training data opacity: Only 0.3% of GPT-2's training data came from the entire English-language Wikipedia. The remainder was scraped web pages, social media, and unattributed sources. Anthropic faced a $1.5 billion lawsuit for scanning thousands of purchased books then discarding them to eliminate copyright paper trails — a pattern users should factor into trust decisions.
  • Profile inference from minimal data: Three email sign-ups — Instagram, a political newsletter, and an AI publication — are sufficient for platforms to infer age, ideology, and interests, then expand the profile by serving targeted ads and measuring click behavior. Non-clicks on religious or political content are themselves used to fill profile gaps.
  • Open vs. open-washed AI models: Proton's Lumo assistant deploys genuinely open models — including GLM 5.1, Qwen 3.5, and NVIDIA's Nematron series — where training data, code, and architecture are all publicly verifiable. Models labeled open-source but with undisclosed training data, such as Meta's Llama, are described as "open-washing" and carry the same trust risks as proprietary systems.
  • Privacy-preserving AI within encrypted environments: Proton implements local indexing of Drive folders linked to Lumo projects, enabling retrieval-augmented generation without sending documents to external servers. Users can disable web search APIs entirely if their threat model requires it, and all chat history is end-to-end encrypted with user-held keys, making server-side data access structurally impossible.

Notable Moment

Maguire describes how platforms actively probe unknown profile attributes — such as religion or political affiliation — by serving targeted ads and measuring non-clicks as data points. The absence of engagement is itself recorded, meaning passive scrolling still continuously fills gaps in a user's behavioral profile.

Know someone who'd find this useful?

You just read a 3-minute summary of a 52-minute episode.

Get Eye on AI summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Eye on AI

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

You're clearly into Eye on AI.

Every Monday, we deliver AI summaries of the latest episodes from Eye on AI and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime