Skip to main content
Fei-fei Li

Fei-fei Li

Dr. Fei-Fei Li is a pioneering computer scientist who fundamentally transformed machine learning through her creation of ImageNet, the landmark dataset that became the foundation for modern deep learning and computer vision. As co-director of Stanford's Human-Centered AI Institute and founder of World Labs, she works on spatial intelligence—developing AI systems that can understand and generate three-dimensional worlds. Often called the 'Godmother of AI,' Li advocates for developing artificial intelligence responsibly to augment human capabilities while addressing critical ethical considerations.

6episodes
5podcasts

Featured On 5 Podcasts

All Appearances

6 episodes
Eye on AI

#303 Fei-Fei Li: Spatial Intelligence, World Models & the Future of AI

Eye on AI
61 minAI Researcher and Founder of World Labs

AI Summary

→ WHAT IT COVERS Fei-Fei Li explains spatial intelligence as the next frontier beyond language models, discussing World Labs' Marble model that generates consistent three-dimensional spaces from multimodal inputs, requiring fundamentally different approaches than text-based AI systems. → KEY INSIGHTS - **Multimodal World Models:** World Labs' Marble accepts text, single or multiple images, videos, and coarse three-dimensional layouts as inputs, generating spatially consistent environments that users can navigate through. This multimodal approach mirrors how biological systems learn through multiple sensory channels beyond language alone. - **Efficient Inference Architecture:** The Real-Time Frame Model achieves frame-based generation with geometric consistency and permanence using a single H100 GPU during inference, dramatically reducing computational requirements compared to other frame-based models that require undisclosed numbers of chips for similar output quality. - **Statistical Physics Limitations:** Current generative AI models, including video generators, learn physics through statistical patterns from training data rather than deducing Newtonian laws. Water movement and tree motion in generated content reflect observed patterns, not fundamental physical principles, requiring integration with physics engines for true physical accuracy. - **Universal Task Function Challenge:** Unlike language models' next token prediction that perfectly aligns training with inference, spatial intelligence lacks an equivalent universal objective function. Three-dimensional reconstruction, next frame prediction, and other candidates each have limitations, making this a fundamental unsolved problem in world modeling. - **Abstract Reasoning Gap:** AI systems can perform semantic understanding like changing couch colors on command, but cannot abstract causal relationships at the level required to deduce physical laws from observational data. Current transformer architectures lack mechanisms for the conceptual abstraction that produced theories like Newtonian motion or special relativity. → NOTABLE MOMENT Li challenges the notion that current AI could deduce fundamental physics laws from data, arguing that abstracting concepts like force, mass, and acceleration from satellite observations requires architectural breakthroughs beyond transformers, which lack mechanisms for causal abstraction at that conceptual level. 💼 SPONSORS [{"name": "Agency", "url": "https://agntcy.org"}] 🏷️ Spatial Intelligence, World Models, Computer Vision, AI Architecture, Multimodal Learning

AI Summary

→ WHAT IT COVERS Dr. Fei-Fei Li, the "godmother of AI," discusses how ImageNet sparked modern AI, her new world model company World Labs, and why spatial intelligence will unlock robotics and human augmentation. → KEY INSIGHTS - **ImageNet breakthrough:** Li created 15 million labeled images across 22,000 concepts in 2007, providing the big data foundation that enabled 2012's AlexNet breakthrough using just two NVIDIA GPUs to solve object recognition. - **AI adoption timeline:** Tech companies avoided calling themselves "AI companies" as late as 2015-2016, fearing it was a "dirty word," with widespread AI branding only beginning around 2017 - less than a decade ago. - **World models vs language models:** Spatial intelligence requires understanding 3D worlds for interaction and reasoning, not just passive video generation - essential for robotics, scientific discovery, and human augmentation beyond conversational AI. - **Robotics data challenge:** Unlike language models where training data matches output format, robotics lacks sufficient action data in 3D worlds, requiring teleoperation data, synthetic environments, and world models to bridge this gap. - **Marble production impact:** World Labs' new world model tool reduced virtual production time by 40x for Sony collaborations, enabling creators to generate navigable 3D environments from text prompts for films, games, and simulations. → NOTABLE MOMENT Li reveals that modern AI still cannot perform tasks a toddler can do, like counting chairs in office room videos, demonstrating how far current systems remain from human-level spatial reasoning capabilities. 💼 SPONSORS [{"name": "Figma", "url": "figma.com/lenny"}, {"name": "Justworks", "url": "justworks.com"}, {"name": "Synch", "url": "sinch.com/lenny"}] 🏷️ AI History, World Models, Spatial Intelligence, Computer Vision, Robotics

a16z Podcast

What Comes After ChatGPT? The Mother of ImageNet Predicts The Future

a16z Podcast
62 minStanford Professor, World Labs Co-Founder

AI Summary

→ WHAT IT COVERS Fei-Fei Li and Justin Johnson discuss World Labs' MARVEL model, which generates explorable 3D worlds from text and images, representing their vision for spatial intelligence beyond current language models. → KEY QUESTIONS ANSWERED - What comes after ChatGPT in AI development? - How does spatial intelligence differ from linguistic intelligence? - What are the practical applications of 3D world generation? - How do transformers actually work as set models? → KEY TOPICS DISCUSSED - MARVEL Model Architecture: World Labs' first-in-class system generates interactive 3D worlds using Gaussian splats as atomic units, enabling real-time rendering on mobile devices and precise camera control for creative applications. - Spatial Intelligence Framework: Li defines spatial intelligence as the capability to reason, understand, move and interact in space, complementing linguistic intelligence and requiring different learning paradigms than current language models. - Academic-Industry Balance: Discussion covers resource imbalances between academia and industry, the role of open science versus proprietary development, and how compute scaling has shifted from individual GPUs to distributed clusters. → NOTABLE MOMENT Li reveals that when she graduated, she thought solving image storytelling would take her entire career, but the combination of ConvNets and LSTMs made it possible within years. 💼 SPONSORS None detected 🏷️ Spatial Intelligence, World Models, 3D Generation, Computer Vision, AI Research

AI Summary

→ WHAT IT COVERS Fei-Fei Li explains why spatial intelligence represents the next frontier of AI beyond language models, discussing her company World Labs' work on world modeling and its critical applications in robotics, simulation, and creative industries. → KEY INSIGHTS - **Spatial Intelligence Foundation:** World modeling extends beyond language to represent visual semantics, physical space, and actions—enabling immersive experiences for creators, designers, and industrial applications including healthcare, education, and robotic simulation that language alone cannot express. - **Data Challenge in World Modeling:** Unlike language data readily available on the internet, world modeling requires multimodal spatial data including three-dimensional geometry, physics, and dynamics that are significantly harder to obtain, creating a fundamental bottleneck for development progress. - **Robotics Timeline Reality:** Self-driving cars took over twenty years from Sebastian Thrun's 130-mile Nevada desert demonstration to Waymo operating in San Francisco—even with established automotive infrastructure. More complex robots that must touch objects precisely will require substantially longer development cycles. - **Trust as Human Responsibility:** Trust in AI cannot be outsourced to machines—it must remain fundamentally human at individual, community, and societal levels. Entrepreneurs should prioritize human agency and trust-building from day one, regardless of whether their product seems directly connected to sensitive applications. → NOTABLE MOMENT Li challenges Wittgenstein's philosophy that language defines the limits of the world, arguing the world is actually limitless beyond symbolic description—requiring spatial intelligence to represent how we truly experience and interact with physical reality. 💼 SPONSORS [{"name": "Freshworks", "url": "freshworks.com"}, {"name": "Rippling", "url": "rippling.com/scale"}, {"name": "Superhuman", "url": "superhuman.com/podcast"}, {"name": "Capital One Business", "url": "capital1.com/businesscards"}, {"name": "Project Management Institute", "url": "pmi.org"}] 🏷️ Spatial Intelligence, World Modeling, AI Robotics, Human-Centered AI

AI Summary

→ WHAT IT COVERS Dr. Fei-Fei Li discusses her journey from Chinese immigrant to AI pioneer, creating ImageNet dataset, founding World Labs for spatial intelligence, and addressing AI's civilizational impact on society. → KEY QUESTIONS ANSWERED - How did ImageNet become the foundation for modern AI? - What is spatial intelligence and why does it matter? - How should parents prepare children for an AI-driven future? - What are people missing about AI's societal impact? → KEY TOPICS DISCUSSED - ImageNet Creation: Li built the largest visual dataset using Amazon Mechanical Turk for crowdsourced labeling, requiring quality controls and gold standard answers to prevent gaming incentives. - World Labs Mission: The company develops spatial intelligence AI that enables machines to understand three-dimensional environments, supporting creators, designers, and eventually robotic applications through world modeling. - AI Education Impact: Traditional degree credentials matter less for hiring as collaborative AI tools become essential, requiring students to demonstrate learning ability over formal qualifications. → NOTABLE MOMENT Li reveals her father named her Fei Fei after catching and releasing a bird while bicycling late to the hospital during her birth, with fei meaning flying in Chinese. 💼 SPONSORS [{"name": "Helix Sleep", "url": "helixsleep.com/tim"}, {"name": "Seed", "url": "seed.com/tim"}, {"name": "Wealthfront", "url": "wealthfront.com/tim"}] 🏷️ Artificial Intelligence, Computer Vision, ImageNet, Spatial Intelligence, AI Education

AI Summary

→ WHAT IT COVERS World Labs cofounders Fei-Fei Li and Justin Johnson explain spatial intelligence as the next AI frontier, moving beyond language models to understand three-dimensional worlds. → KEY QUESTIONS ANSWERED - Why is spatial intelligence fundamentally different from language models? - How does three-dimensional representation enable new AI applications? - What technical breakthroughs made spatial intelligence possible now? → KEY TOPICS DISCUSSED - ImageNet Legacy: Li's 2009 dataset with millions of images unlocked modern computer vision by proving data-driven approaches could surpass traditional algorithms through Internet-scale training. - Reconstruction Meets Generation: NeRF technology and diffusion models merged reconstruction and generation capabilities, allowing systems to both perceive existing scenes and create new ones. → NOTABLE MOMENT Johnson reveals AlexNet's six-day training run on two GTX 580 GPUs would complete in under five minutes on today's single GB 200 processor. 💼 SPONSORS None detected 🏷️ Spatial Intelligence, Computer Vision, World Labs, Neural Radiance Fields

Explore More

Never miss Fei-fei Li's insights

Subscribe to get AI-powered summaries of Fei-fei Li's podcast appearances delivered to your inbox weekly.

Start Free Today

No credit card required • Free tier available