What are the key takeaways from this Machine Learning Street Talk episode?

Key insights include: **Energy-Based Models vs Neural Networks:** Energy-based models differ from traditional feedforward networks by applying cost functions to internal states, not just inputs and outputs. This requires two minimizations: one for the energetic minimum of hidden nodes and one for prediction error. Variational autoencoders exemplify this approach, with encoders, decoders, and cost functions operating on internal representations like Gaussian distributions.; **Agency Identification Problem:** Determining whether a system exhibits true agency versus sophisticated policy execution requires examining internal computations, not just observing behavior. An agent performing Monte Carlo tree search and planning can appear identical to a complex function transformation from outside. The practical approach involves measuring internal state sophistication using metrics like transfer entropy to assign degrees of agency.; **Test-Time Training Limitations:** Current test-time training methods train networks in supervised mode, then activate additional weight adjustments during deployment. This approach seems unwise because the original network never learned with those latent variables active during training. Traditional energy-based models optimize latent variables throughout the entire training process, not just at deployment, creating more robust learning.

What did Jeff Beck discuss on Machine Learning Street Talk?

Dr. Jeff Beck explores energy-based models, variational autoencoders, and the nature of agency in AI systems. The conversation covers geometric deep learning, Bayesian inference, self-supervised learning architectures like JEPA, continual learning challenges, and the future of autonomous AI systems capable of scientific discovery and experimental design. Key topics include: **Energy-Based Models vs Neural Networks:** Energy-based models differ from traditional feedforward networks by applying cost functions to internal states, not just inputs and outputs. This requires two minimizations: one for the energetic minimum of hidden nodes and one for prediction error. Variational autoencoders exemplify this approach, with encoders, decoders, and cost functions operating on internal representations like Gaussian distributions.; **Agency Identification Problem:** Determining whether a system exhibits true agency versus sophisticated policy execution requires examining internal computations, not just observing behavior. An agent performing Monte Carlo tree search and planning can appear identical to a complex function transformation from outside. The practical approach involves measuring internal state sophistication using metrics like transfer entropy to assign degrees of agency..

How long is this episode of Machine Learning Street Talk?

This episode is 46 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Machine Learning Street Talk

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

January 25, 2026

46 min episode · 2 min read

Jeff Beck

Episode

46 min

Read time

2 min

Topics

Design & UX, Artificial Intelligence, Software Development

AI-Generated Summary

Published Jan 25, 2026

Key Takeaways

✓Energy-Based Models vs Neural Networks: Energy-based models differ from traditional feedforward networks by applying cost functions to internal states, not just inputs and outputs. This requires two minimizations: one for the energetic minimum of hidden nodes and one for prediction error. Variational autoencoders exemplify this approach, with encoders, decoders, and cost functions operating on internal representations like Gaussian distributions.
✓Agency Identification Problem: Determining whether a system exhibits true agency versus sophisticated policy execution requires examining internal computations, not just observing behavior. An agent performing Monte Carlo tree search and planning can appear identical to a complex function transformation from outside. The practical approach involves measuring internal state sophistication using metrics like transfer entropy to assign degrees of agency.
✓Test-Time Training Limitations: Current test-time training methods train networks in supervised mode, then activate additional weight adjustments during deployment. This approach seems unwise because the original network never learned with those latent variables active during training. Traditional energy-based models optimize latent variables throughout the entire training process, not just at deployment, creating more robust learning.
✓Self-Supervised Learning Trade-offs: Joint embedding prediction architectures compress inputs and outputs into latent spaces for learning, avoiding pixel-level prediction requirements. The challenge is preventing mode collapse where both embeddings become zero. Non-contrastive methods like BYOL and Barlow Twins use various regularization techniques to maintain representation richness while avoiding the expensive negative sampling required by traditional contrastive approaches.
✓Continual Learning Requirements: True artificial intelligence requires systems that instantiate new objects or models when encountering unexpected situations, not just learning from fixed training sets. This involves Bayesian nonparametric approaches with Dirichlet process priors that trigger learning when surprises occur. Object-centered physics discovery enables systems to create brand new object representations autonomously to explain novel situations, combining existing modules in new ways.

What It Covers

Dr. Jeff Beck explores energy-based models, variational autoencoders, and the nature of agency in AI systems. The conversation covers geometric deep learning, Bayesian inference, self-supervised learning architectures like JEPA, continual learning challenges, and the future of autonomous AI systems capable of scientific discovery and experimental design.

Key Questions Answered

•Energy-Based Models vs Neural Networks: Energy-based models differ from traditional feedforward networks by applying cost functions to internal states, not just inputs and outputs. This requires two minimizations: one for the energetic minimum of hidden nodes and one for prediction error. Variational autoencoders exemplify this approach, with encoders, decoders, and cost functions operating on internal representations like Gaussian distributions.
•Agency Identification Problem: Determining whether a system exhibits true agency versus sophisticated policy execution requires examining internal computations, not just observing behavior. An agent performing Monte Carlo tree search and planning can appear identical to a complex function transformation from outside. The practical approach involves measuring internal state sophistication using metrics like transfer entropy to assign degrees of agency.
•Test-Time Training Limitations: Current test-time training methods train networks in supervised mode, then activate additional weight adjustments during deployment. This approach seems unwise because the original network never learned with those latent variables active during training. Traditional energy-based models optimize latent variables throughout the entire training process, not just at deployment, creating more robust learning.
•Self-Supervised Learning Trade-offs: Joint embedding prediction architectures compress inputs and outputs into latent spaces for learning, avoiding pixel-level prediction requirements. The challenge is preventing mode collapse where both embeddings become zero. Non-contrastive methods like BYOL and Barlow Twins use various regularization techniques to maintain representation richness while avoiding the expensive negative sampling required by traditional contrastive approaches.
•Continual Learning Requirements: True artificial intelligence requires systems that instantiate new objects or models when encountering unexpected situations, not just learning from fixed training sets. This involves Bayesian nonparametric approaches with Dirichlet process priors that trigger learning when surprises occur. Object-centered physics discovery enables systems to create brand new object representations autonomously to explain novel situations, combining existing modules in new ways.

Notable Moment

Beck challenges the assumption that physical embodiment defines agency, arguing a high-fidelity computer simulation of himself would only become an agent if placed in his physical body. He maintains agents must be physical entities, not just computational models, even when the simulated version performs identical calculations and exhibits indistinguishable behavior from outside observation.

Know someone who'd find this useful?