#326 Zuzanna Stamirowska: Inside Pathway's AI Systems That Work with Live, Real-Time Data
Episode
67 min
Read time
2 min
Topics
Artificial Intelligence, Science & Discovery
AI-Generated Summary
Key Takeaways
- ✓Memory at inference, not just training: BDH updates its fast weights continuously during inference, meaning the model retains context across a session rather than resetting each time. This directly addresses the core LLM limitation where every session starts from scratch — analogous to an employee who never accumulates experience beyond their first day on the job.
- ✓State lives on edges, not nodes: Unlike transformers where knowledge encodes in node weights, BDH stores state on synaptic edges (fast weights) while nodes function purely as computations. This duality, drawn from quantum physics principles, allows individual synapses to represent specific concepts — the paper demonstrates a single synapse activating consistently for the concept of currency.
- ✓Sparse local dynamics reduce compute: BDH uses a graph topology where neurons connect only to relevant neighbors, not all-to-all as in transformers. Attention scales linearly with neuron count n rather than quadratically. At inference, only a small local subgraph activates per step, meaning a model with a massive state accesses only a fraction of it — potentially cutting reasoning compute by 10x per output token.
- ✓Model merging via graph concatenation: Two separately trained BDH graphs can be joined along the shared neuron dimension n, then fine-tuned together to form cross-domain connections. A paper experiment merges two single-language models, producing coherent mixed-language output. This composability enables domain fusion — combining, for example, a finance-trained and a law-trained model into one integrated reasoning system.
- ✓Target use cases: small data and long-horizon reasoning: BDH's first commercial applications focus on high-value, data-scarce domains such as nuclear engineering documentation and healthcare claims resolution. The architecture's interpretability advantage — visible synapse activation patterns — supports regulated industries requiring explainability. AWS customers gain access through a Pathway-NVIDIA-AWS partnership announced at AWS re:Invent in December 2024.
What It Covers
Pathway cofounder Zuzanna Stamirowska presents the Dragon Hatchling (BDH) architecture, a post-transformer neural network modeled on brain-like graph dynamics. The system stores state on edges rather than nodes, enables persistent memory at inference time, and trains comparably to GPT-2 while targeting enterprise reasoning tasks requiring small data and long-horizon coherence.
Key Questions Answered
- •Memory at inference, not just training: BDH updates its fast weights continuously during inference, meaning the model retains context across a session rather than resetting each time. This directly addresses the core LLM limitation where every session starts from scratch — analogous to an employee who never accumulates experience beyond their first day on the job.
- •State lives on edges, not nodes: Unlike transformers where knowledge encodes in node weights, BDH stores state on synaptic edges (fast weights) while nodes function purely as computations. This duality, drawn from quantum physics principles, allows individual synapses to represent specific concepts — the paper demonstrates a single synapse activating consistently for the concept of currency.
- •Sparse local dynamics reduce compute: BDH uses a graph topology where neurons connect only to relevant neighbors, not all-to-all as in transformers. Attention scales linearly with neuron count n rather than quadratically. At inference, only a small local subgraph activates per step, meaning a model with a massive state accesses only a fraction of it — potentially cutting reasoning compute by 10x per output token.
- •Model merging via graph concatenation: Two separately trained BDH graphs can be joined along the shared neuron dimension n, then fine-tuned together to form cross-domain connections. A paper experiment merges two single-language models, producing coherent mixed-language output. This composability enables domain fusion — combining, for example, a finance-trained and a law-trained model into one integrated reasoning system.
- •Target use cases: small data and long-horizon reasoning: BDH's first commercial applications focus on high-value, data-scarce domains such as nuclear engineering documentation and healthcare claims resolution. The architecture's interpretability advantage — visible synapse activation patterns — supports regulated industries requiring explainability. AWS customers gain access through a Pathway-NVIDIA-AWS partnership announced at AWS re:Invent in December 2024.
- •Scale-free network structure emerges organically: During training, BDH's connection degree distribution converges toward a scale-free topology without any top-down design. This structure, common in organic complex systems like the internet and social networks, provides resilience, efficient communicability, and self-similar properties across network sizes — properties the team argues will support generalization and reasoning beyond what fixed transformer architectures can achieve.
Notable Moment
Stamirowska describes an experiment where two BDH models, each trained on a different language, were merged by simply concatenating their graphs. Without additional joint training, the combined model began producing output that mixed vocabulary from both languages in a coherent way — demonstrating that architectural composability is a structural property, not an engineering workaround.
You just read a 3-minute summary of a 64-minute episode.
Get Eye on AI summarized like this every Monday — plus up to 2 more podcasts, free.
Pick Your Podcasts — FreeKeep Reading
More from Eye on AI
#338 Amith Singhee: Can India Catch Up in AI? IBM's Amith Singhee on What It Will Take
Apr 24 · 46 min
The Mel Robbins Podcast
Do THIS Every Day to Rewire Your Brain From Stress and Anxiety
Apr 27
More from Eye on AI
#337 Debdas Sen: Why AI Without ROI Will Die (Again)
Apr 23 · 51 min
The Model Health Show
The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow
Apr 27
More from Eye on AI
We summarize every new episode. Want them in your inbox?
#338 Amith Singhee: Can India Catch Up in AI? IBM's Amith Singhee on What It Will Take
#337 Debdas Sen: Why AI Without ROI Will Die (Again)
#336 Professor Mausam: Why India Is Losing the AI Race and What It Will Take to Catch Up
#335 Sriram Raghavan: Why IBM Is Betting Everything on Small AI Models
#334 Abhishek Singh: The $1.2 Billion Plan to Turn India Into an AI Superpower
Similar Episodes
Related episodes from other podcasts
The Mel Robbins Podcast
Apr 27
Do THIS Every Day to Rewire Your Brain From Stress and Anxiety
The Model Health Show
Apr 27
The Menopause Gut: Why Metabolism Changes & How to Reclaim Your Body - With Cynthia Thurlow
The Rest is History
Apr 26
664. Britain in the 70s: Scandal in Downing Street (Part 3)
The Learning Leader Show
Apr 26
685: David Epstein - The Freedom Trap, Narrative Values, General Magic, The Nobel Prize Winner Who Simplified Everything, Wearing the Same Thing Everyday, and Why Constraints Are the Secret to Your Best Work
The AI Breakdown
Apr 26
Where the Economy Thrives After AI
Explore Related Topics
This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.
Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.
You're clearly into Eye on AI.
Every Monday, we deliver AI summaries of the latest episodes from Eye on AI and 192+ other podcasts. Free for up to 3 shows.
Start My Monday DigestNo credit card · Unsubscribe anytime