[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

December 30, 2025

45 min episode · 2 min read

Ashvin Nair

Episode

45 min

Read time

2 min

Topics

Artificial Intelligence

AI-Generated Summary

Published Feb 3, 2026

Key Takeaways

✓Reasoning Model Development Scale: OpenAI's o1 reasoning model started with approximately 12 core people but expanded to 50-100 contributors for the initial release and eventually 300 people for o3. The breakthrough came in 2023 when RL applied to smaller pretrained models produced surprisingly accurate reasoning traces on math problems, demonstrating capabilities unachievable through additional pretraining alone, leading to full-scale investment.
✓RL Generalization Limitations: Reinforcement learning for language models excels at dominating training distributions but generalizes poorly beyond them. The solution requires bringing economically useful tasks into the training distribution rather than expecting broad generalization. This means products must capture complete user context including code repositories, terminal access, conversation history, and workflow data to enable effective RL training on real-world tasks.
✓Robotics Market Timing: Language model agents represent a trillion-dollar market opportunity before robotics reaches even ten billion dollars in value. Current AI robotics sits at the GPT-1 to GPT-2 development stage, showing hints of generalization but lacking reliable out-of-distribution performance. The technology requires demonstrable value creation before unit economics can work, including maintenance costs and reliability thresholds for commercial deployment.
✓Continual Learning Gap: Models trained on trillions of tokens should theoretically handle millions of deployment tokens without capacity constraints, yet they repeatedly make identical mistakes within and across contexts. The field needs breakthroughs in continual learning that enable models to permanently learn from single experiences, similar to humans avoiding hot stoves after one touch, rather than requiring explicit data curation and filtering.
✓Product-Model Co-Design: Cursor's 20-25 person ML team ships competitive models by tightly integrating product and model development. Their Composer model balances intelligence with speed to keep programmers in flow state, avoiding context-switching from slow inference. Internal tooling enables SSH sessions into user environments for direct data inspection, and policy updates occur every two hours, impossible at larger organizations with separated product and research teams.

What It Covers

Ashvin Nair, former OpenAI reasoning team member now at Cursor, discusses the transition from robotics to language models, the development of OpenAI's o1/o3 reasoning models with a 300-person team, achieving IMO/IOI gold medals, and Cursor's approach to co-designing products with models through rapid RL iteration cycles every two hours.

Key Questions Answered

•Reasoning Model Development Scale: OpenAI's o1 reasoning model started with approximately 12 core people but expanded to 50-100 contributors for the initial release and eventually 300 people for o3. The breakthrough came in 2023 when RL applied to smaller pretrained models produced surprisingly accurate reasoning traces on math problems, demonstrating capabilities unachievable through additional pretraining alone, leading to full-scale investment.
•RL Generalization Limitations: Reinforcement learning for language models excels at dominating training distributions but generalizes poorly beyond them. The solution requires bringing economically useful tasks into the training distribution rather than expecting broad generalization. This means products must capture complete user context including code repositories, terminal access, conversation history, and workflow data to enable effective RL training on real-world tasks.
•Robotics Market Timing: Language model agents represent a trillion-dollar market opportunity before robotics reaches even ten billion dollars in value. Current AI robotics sits at the GPT-1 to GPT-2 development stage, showing hints of generalization but lacking reliable out-of-distribution performance. The technology requires demonstrable value creation before unit economics can work, including maintenance costs and reliability thresholds for commercial deployment.
•Continual Learning Gap: Models trained on trillions of tokens should theoretically handle millions of deployment tokens without capacity constraints, yet they repeatedly make identical mistakes within and across contexts. The field needs breakthroughs in continual learning that enable models to permanently learn from single experiences, similar to humans avoiding hot stoves after one touch, rather than requiring explicit data curation and filtering.
•Product-Model Co-Design: Cursor's 20-25 person ML team ships competitive models by tightly integrating product and model development. Their Composer model balances intelligence with speed to keep programmers in flow state, avoiding context-switching from slow inference. Internal tooling enables SSH sessions into user environments for direct data inspection, and policy updates occur every two hours, impossible at larger organizations with separated product and research teams.

Notable Moment

Nair reveals that when attending The Curve conference before o1's release, attendees predicted 20% performance on math benchmarks by 2027, yet OpenAI already had models exceeding those estimates internally. The same forecasters predicting Dyson spheres by 2035 were simultaneously underestimating near-term capabilities by multiple years, demonstrating systematic miscalibration in AI progress predictions.

Know someone who'd find this useful?

You just read a 3-minute summary of a 42-minute episode.

Get Latent Space summarized like this every Monday — plus up to 2 more podcasts, free.

Pick Your Podcasts — Free

Similar Episodes

Related episodes from other podcasts

Morning Brew Daily

Apr 30

🦸‍♀️ “MAMA Stocks” — Zuck’s Ad/AI machine. Hilary Duff’s anti-Ozempic bet. Bill Ackman’s Influencer IPO. +Refresher surge

The Mel Robbins Podcast

Apr 30

Eat This to Live Longer, Stay Young, and Transform Your Health

Explore Related Topics

🤖Artificial Intelligence

This podcast is featured in Best AI Podcasts (2026) — ranked and reviewed with AI summaries.

Read this week's AI & Machine Learning Podcast Insights — cross-podcast analysis updated weekly.

You're clearly into Latent Space.

Every Monday, we deliver AI summaries of the latest episodes from Latent Space and 192+ other podcasts. Free for up to 3 shows.

Start My Monday Digest

No credit card · Unsubscribe anytime

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

AI-Generated Summary

Key Takeaways

What It Covers

Key Questions Answered

Notable Moment

Keep Reading

Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition

Jerome Powell Ain’t Leavin’ Yet & Movie Tickets Cost $50!?

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Workday’s Last Workday? AI and the Future of Enterprise Software

More from Latent Space

Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion