#326 Zuzanna Stamirowska: Inside Pathway's AI Systems That Work with Live, Real-Time Data
Episode
67 min
Read time
2 min
Topics
Productivity, Health & Wellness, Relationships
AI-Generated Summary
Key Takeaways
- ✓Memory at inference, not just training: BDH updates its fast weights continuously during inference, meaning the model retains context across a session rather than resetting each time. This directly addresses the core LLM limitation where every session starts from scratch — analogous to an employee who never accumulates experience beyond their first day on the job.
- ✓State lives on edges, not nodes: Unlike transformers where knowledge encodes in node weights, BDH stores state on synaptic edges (fast weights) while nodes function purely as computations. This duality, drawn from quantum physics principles, allows individual synapses to represent specific concepts — the paper demonstrates a single synapse activating consistently for the concept of currency.
- ✓Sparse local dynamics reduce compute: BDH uses a graph topology where neurons connect only to relevant neighbors, not all-to-all as in transformers. Attention scales linearly with neuron count n rather than quadratically. At inference, only a small local subgraph activates per step, meaning a model with a massive state accesses only a fraction of it — potentially cutting reasoning compute by 10x per output token.
- ✓Model merging via graph concatenation: Two separately trained BDH graphs can be joined along the shared neuron dimension n, then fine-tuned together to form cross-domain connections. A paper experiment merges two single-language models, producing coherent mixed-language output. This composability enables domain fusion — combining, for example, a finance-trained and a law-trained model into one integrated reasoning system.
- ✓Target use cases: small data and long-horizon reasoning: BDH's first commercial applications focus on high-value, data-scarce domains such as nuclear engineering documentation and healthcare claims resolution. The architecture's interpretability advantage — visible synapse activation patterns — supports regulated industries requiring explainability. AWS customers gain access through a Pathway-NVIDIA-AWS partnership announced at AWS re:Invent in December 2024.
What It Covers
Pathway cofounder Zuzanna Stamirowska presents the Dragon Hatchling (BDH) architecture, a post-transformer neural network modeled on brain-like graph dynamics. The system stores state on edges rather than nodes, enables persistent memory at inference time, and trains comparably to GPT-2 while targeting enterprise reasoning tasks requiring small data and long-horizon coherence.
Key Questions Answered
- •Memory at inference, not just training: BDH updates its fast weights continuously during inference, meaning the model retains context across a session rather than resetting each time. This directly addresses the core LLM limitation where every session starts from scratch — analogous to an employee who never accumulates experience beyond their first day on the job.
- •State lives on edges, not nodes: Unlike transformers where knowledge encodes in node weights, BDH stores state on synaptic edges (fast weights) while nodes function purely as computations. This duality, drawn from quantum physics principles, allows individual synapses to represent specific concepts — the paper demonstrates a single synapse activating consistently for the concept of currency.
- •Sparse local dynamics reduce compute: BDH uses a graph topology where neurons connect only to relevant neighbors, not all-to-all as in transformers. Attention scales linearly with neuron count n rather than quadratically. At inference, only a small local subgraph activates per step, meaning a model with a massive state accesses only a fraction of it — potentially cutting reasoning compute by 10x per output token.
- •Model merging via graph concatenation: Two separately trained BDH graphs can be joined along the shared neuron dimension n, then fine-tuned together to form cross-domain connections. A paper experiment merges two single-language models, producing coherent mixed-language output. This composability enables domain fusion — combining, for example, a finance-trained and a law-trained model into one integrated reasoning system.
- •Target use cases: small data and long-horizon reasoning: BDH's first commercial applications focus on high-value, data-scarce domains such as nuclear engineering documentation and healthcare claims resolution. The architecture's interpretability advantage — visible synapse activation patterns — supports regulated industries requiring explainability. AWS customers gain access through a Pathway-NVIDIA-AWS partnership announced at AWS re:Invent in December 2024.
- •Scale-free network structure emerges organically: During training, BDH's connection degree distribution converges toward a scale-free topology without any top-down design. This structure, common in organic complex systems like the internet and social networks, provides resilience, efficient communicability, and self-similar properties across network sizes — properties the team argues will support generalization and reasoning beyond what fixed transformer architectures can achieve.
Notable Moment
Stamirowska describes an experiment where two BDH models, each trained on a different language, were merged by simply concatenating their graphs. Without additional joint training, the combined model began producing output that mixed vocabulary from both languages in a coherent way — demonstrating that architectural composability is a structural property, not an engineering workaround.
You just read a 3-minute summary of a 64-minute episode.
Get Eye on AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Eye on AI
Every Enterprise Is About to Have a 100,000 Agent Problem | Oren Michaels of Barndoor AI
Jun 6 · 59 min
NVIDIA AI Podcast
How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301
Jun 10
More from Eye on AI
More Customers Chose the AI Agent Than Anyone Expected | Tom Chen, Aircall
Jun 4 · 56 min
Huberman Lab
Eating for Better Sleep & Foods that Improve Metabolic Health | Dr. Marie-Pierre St-Onge
Jun 8
More from Eye on AI
We summarize every new episode. Want them in your inbox?
Every Enterprise Is About to Have a 100,000 Agent Problem | Oren Michaels of Barndoor AI
More Customers Chose the AI Agent Than Anyone Expected | Tom Chen, Aircall
Why the Future of AI Isn't Just Bigger Models. It's Models That Evolve | Risto Miikkulainen of Cognizant
How AI Is Reinventing Elder Care | Chia-Lin Simmons of LogicMark
The App of the Future Is Voice — Not a Screen. Mitel's CTO Luiz Domingos Explains Why.
Similar Episodes
Related episodes from other podcasts
NVIDIA AI Podcast
Jun 10
How Mistral Is Building Frontier AI for the Enterprise | NVIDIA AI Podcast Ep. 301
Huberman Lab
Jun 8
Eating for Better Sleep & Foods that Improve Metabolic Health | Dr. Marie-Pierre St-Onge
Cognitive Revolution
Jun 3
Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures
Cognitive Revolution
May 26
Your Biggest Lever: Designing your AI Career for Maximum Impact, with 80,000 Hours founder Ben Todd
The TWIML AI Podcast
May 21
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's Health & Longevity Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Eye on AI.
Every Monday, we deliver AI summaries of the latest episodes from Eye on AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime