What are the key takeaways from this Cognitive Revolution episode?

Key insights include: **Architecture scaling gradient:** The optimal neural network architecture shifts predictably with model size. Below roughly 100 billion parameters, adding structural biases — gating mechanisms, recurrence, convolutions — improves performance for specialized tasks. Above that threshold, unstructured operators like pure matrix multiplication outperform biased alternatives. Practitioners building small, domain-specific models should actively explore gated recurrent and convolutional hybrids rather than defaulting to transformer-only architectures, which only dominate at maximum scale.; **Automated Foundation Model Design (AFMD):** Liquid AI's internal architecture search system evaluates 50–100 candidate operators using evolutionary strategies with actual target hardware in the loop. Critically, it optimizes against real downstream task performance across 100-plus benchmarks — not proxy metrics like perplexity. Teams building specialized models should replicate this principle: benchmark on the exact hardware and exact task the model will serve, not on general leaderboard proxies that frequently mislead architectural decisions.; **LFM-2 architecture outcome:** Running AFMD on CPU-class hardware consistently surfaces a double-gated 1D convolution as the dominant operator, comprising 70–80% of the resulting network layers, with a reduced number of attention layers filling the remainder. This hybrid achieves competitive quality while dramatically reducing memory footprint and latency versus pure transformer equivalents. The key retained element is input-dependent gating — not the full complexity of state space models — suggesting gating alone captures most of the representational benefit.

What did Ramin Hasani discuss on Cognitive Revolution?

Liquid AI CEO Ramin Hasani explains how his MIT-founded company builds device-native foundation models using automated architecture search, biological inspiration from C. elegans neural dynamics, and hardware-in-the-loop optimization. The company holds the number five spot on Hugging Face US downloads with over one million weekly downloads, targeting the roughly one trillion dollar annual smartphone and laptop market with sub-cloud AI inference. Key topics include: **Architecture scaling gradient:** The optimal neural network architecture shifts predictably with model size. Below roughly 100 billion parameters, adding structural biases — gating mechanisms, recurrence, convolutions — improves performance for specialized tasks. Above that threshold, unstructured operators like pure matrix multiplication outperform biased alternatives. Practitioners building small, domain-specific models should actively explore gated recurrent and convolutional hybrids rather than defaulting to transformer-only architectures, which only dominate at maximum scale.; **Automated Foundation Model Design (AFMD):** Liquid AI's internal architecture search system evaluates 50–100 candidate operators using evolutionary strategies with actual target hardware in the loop. Critically, it optimizes against real downstream task performance across 100-plus benchmarks — not proxy metrics like perplexity. Teams building specialized models should replicate this principle: benchmark on the exact hardware and exact task the model will serve, not on general leaderboard proxies that frequently mislead architectural decisions..

How long is this episode of Cognitive Revolution?

This episode is 107 minutes long. SignalCast provides an AI-generated summary so you can get the key insights in about 3 minutes.

Cognitive Revolution

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

July 4, 2026

107 min episode · 3 min read

Ramin Hasani

Episode

107 min

Read time

3 min

Topics

Startups, Fundraising & VC, Leadership

AI-Generated Summary

Published Jul 4, 2026

Key Takeaways

✓Architecture scaling gradient: The optimal neural network architecture shifts predictably with model size. Below roughly 100 billion parameters, adding structural biases — gating mechanisms, recurrence, convolutions — improves performance for specialized tasks. Above that threshold, unstructured operators like pure matrix multiplication outperform biased alternatives. Practitioners building small, domain-specific models should actively explore gated recurrent and convolutional hybrids rather than defaulting to transformer-only architectures, which only dominate at maximum scale.
✓Automated Foundation Model Design (AFMD): Liquid AI's internal architecture search system evaluates 50–100 candidate operators using evolutionary strategies with actual target hardware in the loop. Critically, it optimizes against real downstream task performance across 100-plus benchmarks — not proxy metrics like perplexity. Teams building specialized models should replicate this principle: benchmark on the exact hardware and exact task the model will serve, not on general leaderboard proxies that frequently mislead architectural decisions.
✓LFM-2 architecture outcome: Running AFMD on CPU-class hardware consistently surfaces a double-gated 1D convolution as the dominant operator, comprising 70–80% of the resulting network layers, with a reduced number of attention layers filling the remainder. This hybrid achieves competitive quality while dramatically reducing memory footprint and latency versus pure transformer equivalents. The key retained element is input-dependent gating — not the full complexity of state space models — suggesting gating alone captures most of the representational benefit.
✓Edge AI market sizing: The global smartphone market generates approximately 500 billion dollars annually, and the laptop market adds another 300–400 billion, totaling roughly one trillion dollars in annual device compute shipments. This substrate currently runs minimal on-device AI inference. A 600-megabyte audio-visual model now powers Mercedes-Benz in-car voice interaction, demonstrating that production-grade multimodal AI can fit within automotive-grade processor constraints today, not as a future projection.
✓Input-dependent dynamics as core principle: Liquid AI traces its architectural philosophy to a 2022 paper introducing input-dependent state space models (Liquid S4), predating Mamba by roughly 18 months. Input dependence means the network's transformation parameters shift based on the current input during the forward pass, while the backward pass learns the dynamics of that adaptation. This second axis — learned dynamics separate from parameter count — allows smaller models to compress more knowledge than parameter count alone would predict.

What It Covers

Liquid AI CEO Ramin Hasani explains how his MIT-founded company builds device-native foundation models using automated architecture search, biological inspiration from C. elegans neural dynamics, and hardware-in-the-loop optimization. The company holds the number five spot on Hugging Face US downloads with over one million weekly downloads, targeting the roughly one trillion dollar annual smartphone and laptop market with sub-cloud AI inference.

Key Questions Answered

•Architecture scaling gradient: The optimal neural network architecture shifts predictably with model size. Below roughly 100 billion parameters, adding structural biases — gating mechanisms, recurrence, convolutions — improves performance for specialized tasks. Above that threshold, unstructured operators like pure matrix multiplication outperform biased alternatives. Practitioners building small, domain-specific models should actively explore gated recurrent and convolutional hybrids rather than defaulting to transformer-only architectures, which only dominate at maximum scale.
•Automated Foundation Model Design (AFMD): Liquid AI's internal architecture search system evaluates 50–100 candidate operators using evolutionary strategies with actual target hardware in the loop. Critically, it optimizes against real downstream task performance across 100-plus benchmarks — not proxy metrics like perplexity. Teams building specialized models should replicate this principle: benchmark on the exact hardware and exact task the model will serve, not on general leaderboard proxies that frequently mislead architectural decisions.
•LFM-2 architecture outcome: Running AFMD on CPU-class hardware consistently surfaces a double-gated 1D convolution as the dominant operator, comprising 70–80% of the resulting network layers, with a reduced number of attention layers filling the remainder. This hybrid achieves competitive quality while dramatically reducing memory footprint and latency versus pure transformer equivalents. The key retained element is input-dependent gating — not the full complexity of state space models — suggesting gating alone captures most of the representational benefit.
•Edge AI market sizing: The global smartphone market generates approximately 500 billion dollars annually, and the laptop market adds another 300–400 billion, totaling roughly one trillion dollars in annual device compute shipments. This substrate currently runs minimal on-device AI inference. A 600-megabyte audio-visual model now powers Mercedes-Benz in-car voice interaction, demonstrating that production-grade multimodal AI can fit within automotive-grade processor constraints today, not as a future projection.
•Input-dependent dynamics as core principle: Liquid AI traces its architectural philosophy to a 2022 paper introducing input-dependent state space models (Liquid S4), predating Mamba by roughly 18 months. Input dependence means the network's transformation parameters shift based on the current input during the forward pass, while the backward pass learns the dynamics of that adaptation. This second axis — learned dynamics separate from parameter count — allows smaller models to compress more knowledge than parameter count alone would predict.
•Closed-form solution unlocking scale: Liquid neural networks were originally governed by differential equations with no known closed-form solution since 1907 (Lapicque's membrane potential equation). Solving these in closed form, published in Nature Machine Intelligence in November 2022, eliminated the need for numerical solvers and enabled scaling from thousands to billions of neurons. The practical implication: biologically-grounded nonlinear architectures are no longer computationally intractable and can now be trained at foundation model scale on standard GPU clusters.
•Fine-tuning platform incoming: Liquid AI plans to release a self-serve platform allowing enterprise customers to fine-tune small, task-specific models on their own data without requiring direct engineering engagement. For practitioners with narrow, well-defined use cases — document classification, catalog search, sensor prediction — this pathway offers a route to frontier-competitive performance at a fraction of inference cost, by combining a pre-optimized architecture with domain-specific supervised fine-tuning on proprietary datasets rather than relying on general-purpose cloud models.

Notable Moment

Hasani describes parking a car autonomously using a control module containing just 12 liquid neurons — not 12 million, not 12,000, but 12. Flying a drone required 30. The result came from modeling C. elegans worm neural dynamics, the only animal whose complete 300-cell nervous system was fully mapped, demonstrating that architectural expressiveness per neuron can substitute for raw parameter count in closed-loop control tasks.

Know someone who'd find this useful?

You just read a 3-minute summary of a 104-minute episode.

Get Cognitive Revolution summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

a16z Podcast

Jun 6

Explore Related Topics

🚀Startups 💰Fundraising & VC 👔Leadership

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's Startups & Product Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Cognitive Revolution.

Every Monday, we deliver AI summaries of the latest episodes from Cognitive Revolution and 192+ other podcasts. Free for one show.

Start My Monday Digest

No credit card · Unsubscribe anytime

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering

Building Search for AI Agents with Exa CEO Will Bryk

AI:AM #4: Cameron on Model Consciousness, Duvenaud's Gradual Disempowerment, swyx's AI-Eng Alpha

How AI Is Reinventing Elder Care | Chia-Lin Simmons of LogicMark

More from Cognitive Revolution

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering

AI:AM #4: Cameron on Model Consciousness, Duvenaud's Gradual Disempowerment, swyx's AI-Eng Alpha

The God We Deserve: Nonzero's Robert Wright on AI as Humanity's Ultimate Test

AI:AM #3: Zvi on Fable, the Cases For & Against the Ban, + AI for Math, Logistics & More

Dean Ball, on Joining OpenAI: New Power Centers, Frontier AI Policy, & Main Character Energy

Similar Episodes

Building Search for AI Agents with Exa CEO Will Bryk

How AI Is Reinventing Elder Care | Chia-Lin Simmons of LogicMark

HIGHLIGHTS: Fabricio Bloisi - CEO of Prosus

#329 Izhar Medalsy: How AI Solves Quantum Computing's Biggest Problem

AI at the Edge is a different operating environment

Explore Related Topics

You're clearly into Cognitive Revolution.