What are the key takeaways from this Axial Podcast episode?

Key insights include: **The 80% Dark Matter Problem:** Current mass spectrometry search algorithms leave up to 80% of measured spectra unidentified, meaning most proteomics-based drug discovery operates on only 20-30% of available data. Researchers should treat preprocessing quality — not downstream analytics — as the primary bottleneck limiting biological discovery and target identification accuracy.; **Cloud Migration as Dual Solution:** Moving proteomics preprocessing from single workstations to cloud infrastructure solves two problems simultaneously: parallelizing compute across hundreds of nodes accelerates processing speed, while the added compute capacity enables replacement of legacy rule-based algorithms with transformer and vision-based AI models that capture previously ignored signal.; **Absence of Signal as Data:** Current scoring algorithms match spectra using barcode-style peak matching, ignoring peak intensities, non-canonical fragmentation patterns, and crucially, the absence of expected peaks. AI models trained without these human-imposed rules learn from missing signal too, producing 70% more peptide identifications on standard human proteomes and 200-300% gains on complex metaproteomics datasets.

What did Peter Cimerman discuss on Axial Podcast?

Peter Cimermančič, cofounder of Tesserai and former seven-year Verily researcher, explains how AI-powered preprocessing of mass spectrometry proteomics data can recover up to 80% of currently unidentified spectra, unlocking drug targets and biological insights that conventional search algorithms systematically miss. Key topics include: **The 80% Dark Matter Problem:** Current mass spectrometry search algorithms leave up to 80% of measured spectra unidentified, meaning most proteomics-based drug discovery operates on only 20-30% of available data. Researchers should treat preprocessing quality — not downstream analytics — as the primary bottleneck limiting biological discovery and target identification accuracy.; **Cloud Migration as Dual Solution:** Moving proteomics preprocessing from single workstations to cloud infrastructure solves two problems simultaneously: parallelizing compute across hundreds of nodes accelerates processing speed, while the added compute capacity enables replacement of legacy rule-based algorithms with transformer and vision-based AI models that capture previously ignored signal..

How long is this episode of Axial Podcast?

This episode is 57 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Axial Podcast

Proteomics and AI with Peter Cimermančič

March 22, 2025

57 min episode · 2 min read

Peter Cimerman

Episode

57 min

Read time

2 min

Topics

Startups, Fundraising & VC, Artificial Intelligence

AI-Generated Summary

Published Mar 14, 2026

Key Takeaways

✓The 80% Dark Matter Problem: Current mass spectrometry search algorithms leave up to 80% of measured spectra unidentified, meaning most proteomics-based drug discovery operates on only 20-30% of available data. Researchers should treat preprocessing quality — not downstream analytics — as the primary bottleneck limiting biological discovery and target identification accuracy.
✓Cloud Migration as Dual Solution: Moving proteomics preprocessing from single workstations to cloud infrastructure solves two problems simultaneously: parallelizing compute across hundreds of nodes accelerates processing speed, while the added compute capacity enables replacement of legacy rule-based algorithms with transformer and vision-based AI models that capture previously ignored signal.
✓Absence of Signal as Data: Current scoring algorithms match spectra using barcode-style peak matching, ignoring peak intensities, non-canonical fragmentation patterns, and crucially, the absence of expected peaks. AI models trained without these human-imposed rules learn from missing signal too, producing 70% more peptide identifications on standard human proteomes and 200-300% gains on complex metaproteomics datasets.
✓Expanding the Search Space: Standard proteomics searches only consider canonical proteins longer than 50 amino acids in unmodified form. Deliberately expanding searches to include small open reading frames, post-translational modifications, and sequence variants — enabled by a sufficiently accurate scoring model — reveals biologically relevant proteins that canonical pipelines structurally cannot detect.
✓Proteomics Covers the Full Drug Discovery Pipeline: Mass spectrometry proteomics applies across every drug discovery stage: affinity purification identifies protein-protein interactions for target discovery; chemoproteomics maps covalent small-molecule binding sites; immunopeptidomics identifies peptides for cancer vaccines; and plasma proteomics predicts patient treatment response and disease outcomes, consistently outperforming other omics modalities in multiomics studies.

What It Covers

Peter Cimermančič, cofounder of Tesserai and former seven-year Verily researcher, explains how AI-powered preprocessing of mass spectrometry proteomics data can recover up to 80% of currently unidentified spectra, unlocking drug targets and biological insights that conventional search algorithms systematically miss.

Key Questions Answered

•The 80% Dark Matter Problem: Current mass spectrometry search algorithms leave up to 80% of measured spectra unidentified, meaning most proteomics-based drug discovery operates on only 20-30% of available data. Researchers should treat preprocessing quality — not downstream analytics — as the primary bottleneck limiting biological discovery and target identification accuracy.
•Cloud Migration as Dual Solution: Moving proteomics preprocessing from single workstations to cloud infrastructure solves two problems simultaneously: parallelizing compute across hundreds of nodes accelerates processing speed, while the added compute capacity enables replacement of legacy rule-based algorithms with transformer and vision-based AI models that capture previously ignored signal.
•Absence of Signal as Data: Current scoring algorithms match spectra using barcode-style peak matching, ignoring peak intensities, non-canonical fragmentation patterns, and crucially, the absence of expected peaks. AI models trained without these human-imposed rules learn from missing signal too, producing 70% more peptide identifications on standard human proteomes and 200-300% gains on complex metaproteomics datasets.
•Expanding the Search Space: Standard proteomics searches only consider canonical proteins longer than 50 amino acids in unmodified form. Deliberately expanding searches to include small open reading frames, post-translational modifications, and sequence variants — enabled by a sufficiently accurate scoring model — reveals biologically relevant proteins that canonical pipelines structurally cannot detect.
•Proteomics Covers the Full Drug Discovery Pipeline: Mass spectrometry proteomics applies across every drug discovery stage: affinity purification identifies protein-protein interactions for target discovery; chemoproteomics maps covalent small-molecule binding sites; immunopeptidomics identifies peptides for cancer vaccines; and plasma proteomics predicts patient treatment response and disease outcomes, consistently outperforming other omics modalities in multiomics studies.

Notable Moment

Cimermančič describes how HIV-human protein interaction studies using mass spectrometry were conducted while seeing only 20% of actual interactions — raising the pointed question of how many viable drug targets for infectious disease have been permanently overlooked due to preprocessing limitations rather than biological absence.

Know someone who'd find this useful?