Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

December 22, 2025

43 min episode · 2 min read

Category Theory

Episode

43 min

Read time

2 min

Topics

Artificial Intelligence, Science & Discovery

AI-Generated Summary

Published Dec 22, 2025

Key Takeaways

✓Algorithmic Failure in LLMs: Large language models perform hundreds of billions of multiplications to generate single tokens yet cannot reliably multiply small numbers together, revealing misalignment between training methods and downstream reasoning tasks requiring correctness guarantees.
✓Beyond Geometric Deep Learning: Group theory handles spatial symmetries but fails for non-invertible computations like Dijkstra's algorithm where multiple different graphs compress to identical outputs, requiring category theory's broader framework to model information-destroying transformations in algorithmic reasoning.
✓Two-Category Weight Tying: Two-morphisms in categorical frameworks formalize when weight sharing is mathematically valid across network layers, enabling provable correctness for parameter sharing beyond simple copying, creating systematic architecture design principles rather than ad-hoc engineering choices.
✓Carry Mechanism Challenge: Implementing arithmetic carries in continuous gradient-based systems requires modeling state changes rather than states themselves, a fundamental operation from CPU design that current graph neural networks struggle to represent, potentially solvable through geometric structures like Hopf fibrations.

What It Covers

Category theory provides a mathematical framework for designing neural networks that can reliably execute algorithms like addition and multiplication, addressing fundamental limitations in current large language models and deep learning architectures.

Key Questions Answered

•Algorithmic Failure in LLMs: Large language models perform hundreds of billions of multiplications to generate single tokens yet cannot reliably multiply small numbers together, revealing misalignment between training methods and downstream reasoning tasks requiring correctness guarantees.
•Beyond Geometric Deep Learning: Group theory handles spatial symmetries but fails for non-invertible computations like Dijkstra's algorithm where multiple different graphs compress to identical outputs, requiring category theory's broader framework to model information-destroying transformations in algorithmic reasoning.
•Two-Category Weight Tying: Two-morphisms in categorical frameworks formalize when weight sharing is mathematically valid across network layers, enabling provable correctness for parameter sharing beyond simple copying, creating systematic architecture design principles rather than ad-hoc engineering choices.
•Carry Mechanism Challenge: Implementing arithmetic carries in continuous gradient-based systems requires modeling state changes rather than states themselves, a fundamental operation from CPU design that current graph neural networks struggle to represent, potentially solvable through geometric structures like Hopf fibrations.

Notable Moment

The discussion reveals that asking neural networks to both translate messy real-world scenarios into structured representations and robustly execute algorithms on fixed computational budgets creates an impossible burden, suggesting future systems need explicit separation between world understanding and algorithmic reasoning components.

Know someone who'd find this useful?