AI Summary
→ WHAT IT COVERS Dwarkesh reflects on Richard Sutton's perspective that current LLMs waste compute during deployment without learning, requiring new architectures for continual learning and true intelligence. → KEY INSIGHTS - **Compute efficiency critique:** LLMs spend most compute during deployment without learning anything, only learning during training on tens of thousands of years of human experience data inefficiently. - **Imitation learning as foundation:** Pretrained LLMs serve as essential priors for reinforcement learning, similar to how AlphaGo used human games before AlphaZero bootstrapped from scratch to superhuman performance. - **Continual learning gap:** Current LLMs learn approximately one bit per episode of tens of thousands of tokens during RL, while animals extract maximum signal continuously from environmental observations. → NOTABLE MOMENT Dwarkesh compares pretraining data to fossil fuels as non-renewable but essential intermediaries, arguing civilization needed them to reach solar panels despite not being the final solution. 💼 SPONSORS None detected 🏷️ AI Architecture, Reinforcement Learning, LLM Training
