Skip to main content
Axial Podcast

Proteomics and AI with Peter Cimermančič

57 min episode · 2 min read
·

Episode

57 min

Read time

2 min

Topics

Artificial Intelligence

AI-Generated Summary

Key Takeaways

  • The 80% Dark Matter Problem: Current mass spectrometry search algorithms leave up to 80% of measured spectra unidentified, meaning most proteomics-based drug discovery operates on only 20-30% of available data. Researchers should treat preprocessing quality — not downstream analytics — as the primary bottleneck limiting biological discovery and target identification accuracy.
  • Cloud Migration as Dual Solution: Moving proteomics preprocessing from single workstations to cloud infrastructure solves two problems simultaneously: parallelizing compute across hundreds of nodes accelerates processing speed, while the added compute capacity enables replacement of legacy rule-based algorithms with transformer and vision-based AI models that capture previously ignored signal.
  • Absence of Signal as Data: Current scoring algorithms match spectra using barcode-style peak matching, ignoring peak intensities, non-canonical fragmentation patterns, and crucially, the absence of expected peaks. AI models trained without these human-imposed rules learn from missing signal too, producing 70% more peptide identifications on standard human proteomes and 200-300% gains on complex metaproteomics datasets.
  • Expanding the Search Space: Standard proteomics searches only consider canonical proteins longer than 50 amino acids in unmodified form. Deliberately expanding searches to include small open reading frames, post-translational modifications, and sequence variants — enabled by a sufficiently accurate scoring model — reveals biologically relevant proteins that canonical pipelines structurally cannot detect.
  • Proteomics Covers the Full Drug Discovery Pipeline: Mass spectrometry proteomics applies across every drug discovery stage: affinity purification identifies protein-protein interactions for target discovery; chemoproteomics maps covalent small-molecule binding sites; immunopeptidomics identifies peptides for cancer vaccines; and plasma proteomics predicts patient treatment response and disease outcomes, consistently outperforming other omics modalities in multiomics studies.

What It Covers

Peter Cimermančič, cofounder of Tesserai and former seven-year Verily researcher, explains how AI-powered preprocessing of mass spectrometry proteomics data can recover up to 80% of currently unidentified spectra, unlocking drug targets and biological insights that conventional search algorithms systematically miss.

Key Questions Answered

  • The 80% Dark Matter Problem: Current mass spectrometry search algorithms leave up to 80% of measured spectra unidentified, meaning most proteomics-based drug discovery operates on only 20-30% of available data. Researchers should treat preprocessing quality — not downstream analytics — as the primary bottleneck limiting biological discovery and target identification accuracy.
  • Cloud Migration as Dual Solution: Moving proteomics preprocessing from single workstations to cloud infrastructure solves two problems simultaneously: parallelizing compute across hundreds of nodes accelerates processing speed, while the added compute capacity enables replacement of legacy rule-based algorithms with transformer and vision-based AI models that capture previously ignored signal.
  • Absence of Signal as Data: Current scoring algorithms match spectra using barcode-style peak matching, ignoring peak intensities, non-canonical fragmentation patterns, and crucially, the absence of expected peaks. AI models trained without these human-imposed rules learn from missing signal too, producing 70% more peptide identifications on standard human proteomes and 200-300% gains on complex metaproteomics datasets.
  • Expanding the Search Space: Standard proteomics searches only consider canonical proteins longer than 50 amino acids in unmodified form. Deliberately expanding searches to include small open reading frames, post-translational modifications, and sequence variants — enabled by a sufficiently accurate scoring model — reveals biologically relevant proteins that canonical pipelines structurally cannot detect.
  • Proteomics Covers the Full Drug Discovery Pipeline: Mass spectrometry proteomics applies across every drug discovery stage: affinity purification identifies protein-protein interactions for target discovery; chemoproteomics maps covalent small-molecule binding sites; immunopeptidomics identifies peptides for cancer vaccines; and plasma proteomics predicts patient treatment response and disease outcomes, consistently outperforming other omics modalities in multiomics studies.

Notable Moment

Cimermančič describes how HIV-human protein interaction studies using mass spectrometry were conducted while seeing only 20% of actual interactions — raising the pointed question of how many viable drug targets for infectious disease have been permanently overlooked due to preprocessing limitations rather than biological absence.

Know someone who'd find this useful?

You just read a 3-minute summary of a 54-minute episode.

Get Axial Podcast summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Keep Reading

More from Axial Podcast

We summarize every new episode. Want them in your inbox?

Similar Episodes

Related episodes from other podcasts

Explore Related Topics

This podcast is featured in Best Biotech Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Axial Podcast.

Every Monday, we deliver AI summaries of the latest episodes from Axial Podcast and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime