#299 Jacob Buckman: Why the Future of AI Won't Be Built on Transformers

November 9, 2025

57 min episode · 2 min read

Jacob Buckman

Episode

57 min

Read time

2 min

Topics

Artificial Intelligence

AI-Generated Summary

Published Dec 22, 2025

Key Takeaways

✓Power Retention Architecture: Combines recurrent neural networks with attention mechanisms through state space models, allowing independent adjustment of state size from parameter count. This enables linear scaling costs instead of quadratic growth as context windows expand.
✓Metamorphosis Retraining Process: Existing transformer models like LLAMA can convert to Power Retention in six hours using dozens of GPUs by swapping attention calls for power retention, preserving original performance while gaining linear-cost inference and unlimited context capabilities.
✓Context vs Weight Updates: Future AI systems should inject new knowledge through context state updates rather than weight fine-tuning. This eliminates catastrophic forgetting issues since context-based learning mirrors human experience accumulation rather than evolutionary weight changes through gradient descent.
✓Butler vs Consultant Dynamic: Current transformers force chat resets due to expensive state growth, creating consultant-like interactions. Power Retention enables persistent state across all user interactions, creating butler-like AI that accumulates complete user history and preferences for better responses.

What It Covers

Jacob Buckman explains Power Retention, a new AI architecture that solves transformer scaling limitations through linear-cost context windows, enabling models to process unlimited context without quadratic compute costs or performance degradation.

Key Questions Answered

•Power Retention Architecture: Combines recurrent neural networks with attention mechanisms through state space models, allowing independent adjustment of state size from parameter count. This enables linear scaling costs instead of quadratic growth as context windows expand.
•Metamorphosis Retraining Process: Existing transformer models like LLAMA can convert to Power Retention in six hours using dozens of GPUs by swapping attention calls for power retention, preserving original performance while gaining linear-cost inference and unlimited context capabilities.
•Context vs Weight Updates: Future AI systems should inject new knowledge through context state updates rather than weight fine-tuning. This eliminates catastrophic forgetting issues since context-based learning mirrors human experience accumulation rather than evolutionary weight changes through gradient descent.
•Butler vs Consultant Dynamic: Current transformers force chat resets due to expensive state growth, creating consultant-like interactions. Power Retention enables persistent state across all user interactions, creating butler-like AI that accumulates complete user history and preferences for better responses.

Notable Moment

Buckman reveals that advertised long-context models use sparse or windowed attention rather than true transformers, processing only small context subsets. This industry-wide practice creates performance degradation that users mistake for inherent limitations rather than architectural compromises.

Know someone who'd find this useful?