Jacob Buckman — Podcast Appearances & Summaries

We have 2 summarized appearances for Jacob Buckman so far. Browse all podcasts to discover more episodes.

Featured On 2 Podcasts

The TWIML AI Podcast

1 episode

Eye on AI

1 episode

All Appearances

2 episodes

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

The TWIML AI Podcast

Oct 7, 202557 min

AI Summary

→ WHAT IT COVERS Jacob Buckman explains power retention architecture for transformers, combining recurrence and attention to achieve linear scaling for long context processing while maintaining computational efficiency through balanced weight-state FLOP ratios and chunked algorithms. → KEY INSIGHTS - **State Size Balance:** Transformers have states 100,000x larger than LSTMs at long context, while RNNs have states too small. Optimal architectures balance weight FLOPS and state FLOPS within one order of magnitude for compute-efficient training and inference. - **Chunked Algorithm:** Power retention uses dual computation forms—recurrent for sequential processing and attention for parallel processing. Breaking sequences into GPU-optimized chunks provides linear cost scaling while maintaining full hardware saturation, achieving best of both approaches without mathematical tradeoffs. - **Model Metamorphosis:** Converting existing transformer models to power retention requires only two hours of retraining on 128 H100s. StarCoder 3B recovered full 30% HumanEval performance after this brief metamorphosis period, making adoption practical without pretraining from scratch. - **Vidrial CUDA Framework:** Custom CUDA framework enables 20% speedups over Flash Attention on non-standard problem shapes by separating static and dynamic computation. JIT compilation sweeps different configurations to find optimal tile sizes and memory patterns for specific hardware and sequence lengths. → NOTABLE MOMENT Buckman reveals that typical window attention models plateau in their ability to use context far earlier than their advertised effective context length, which is calculated as depth times window size, demonstrating they fail to leverage most available tokens. 💼 SPONSORS [{"name": "Capital One", "url": ""}] 🏷️ Transformer Architecture, Long Context Models, CUDA Optimization, State Space Models

Read Full Summary Listen

#299 Jacob Buckman: Why the Future of AI Won't Be Built on Transformers

Eye on AI

Nov 9, 202557 minFounder of Manifest AI, AI Researcher

AI Summary

→ WHAT IT COVERS Jacob Buckman explains Power Retention, a new AI architecture that solves transformer scaling limitations through linear-cost context windows, enabling models to process unlimited context without quadratic compute costs or performance degradation. → KEY INSIGHTS - **Power Retention Architecture:** Combines recurrent neural networks with attention mechanisms through state space models, allowing independent adjustment of state size from parameter count. This enables linear scaling costs instead of quadratic growth as context windows expand. - **Metamorphosis Retraining Process:** Existing transformer models like LLAMA can convert to Power Retention in six hours using dozens of GPUs by swapping attention calls for power retention, preserving original performance while gaining linear-cost inference and unlimited context capabilities. - **Context vs Weight Updates:** Future AI systems should inject new knowledge through context state updates rather than weight fine-tuning. This eliminates catastrophic forgetting issues since context-based learning mirrors human experience accumulation rather than evolutionary weight changes through gradient descent. - **Butler vs Consultant Dynamic:** Current transformers force chat resets due to expensive state growth, creating consultant-like interactions. Power Retention enables persistent state across all user interactions, creating butler-like AI that accumulates complete user history and preferences for better responses. → NOTABLE MOMENT Buckman reveals that advertised long-context models use sparse or windowed attention rather than true transformers, processing only small context subsets. This industry-wide practice creates performance degradation that users mistake for inherent limitations rather than architectural compromises. 💼 SPONSORS [{"name": "Agency", "url": "https://agntcy.org"}] 🏷️ AI Architecture, State Space Models, Context Windows, Transformer Alternatives

Read Full Summary Listen

Explore More

AI & Machine Learning Episodes Best AI Podcasts (2026)

Never miss Jacob Buckman's insights

Subscribe to get AI-powered summaries of Jacob Buckman's podcast appearances delivered to your inbox weekly.

Start Free Today

No credit card required • Free tier available