Sergey Levine - Building LLMs for the Physical World

Invest Like the Best with Patrick O'Shaughnessy Podcast Recap

Published: 2026-03-31

Duration: 1 hr 6 min

Guests: Sergey Levine

Summary

This episode focuses on Sergey Levine's work on developing robotic foundation models to control various embodied systems. The conversation provides valuable insights into the challenges and advancements in creating general-purpose models that can adapt to different robotic forms.

What Happened

Sergey Levine, co-founder and researcher at Physical Intelligence, discusses his work on developing robotic foundation models akin to language models for language tasks. These models aim to control any embodied system, facilitating a general-purpose approach to robotics rather than domain-specific solutions. This approach leverages broad data sources to enhance the adaptability and functionality of robots.

Robotic learning faces unique challenges in generalization and effective demonstrations. Levine highlights that generalization requires robots to perform mundane tasks in a variety of situations. Successful general-purpose embodied foundation models could revolutionize creativity in robotics, similar to the impact of personal computers.

Levine explains that humanoid robots, while capturing public imagination, represent just one form factor among many potential robotic designs. The emphasis should be on developing general intelligence that can be applied across different robotic systems, from humanoids to swarms of drones, without focusing on specific form factors.

The history of robotics reveals significant milestones, including the advent of end-to-end learning systems in the 1980s and deep reinforcement learning in the 2010s. Recent advancements in multimodal language models are crucial for incorporating common sense into robotic systems, enhancing their ability to handle unusual situations with intelligence.

Levine recounts his personal journey in robotics, starting in 2014 with a focus on systems that improve through practice and collective learning. Vision Language Action models are foundational in adapting web knowledge to robotic control, enabling robots to apply common sense in dynamic environments.

Reinforcement learning is a key technique that allows robots to refine their skills through practice, improving both robustness and speed in tasks like making espresso. Effective learning methods can compensate for deficiencies in sensing, enabling robots to gather useful data autonomously, much like Tesla's data collection strategy.

Physical Intelligence prioritizes developing robotic systems that are useful enough to gather data independently, enhancing their learning capabilities. This approach aims to tackle challenges associated with scene interpretation and step selection, which are crucial for successful task execution.

Levine addresses the challenges of deploying robots in real-world settings, emphasizing the need for comfort with imperfection and addressing safety concerns. Technical challenges include adapting to unexpected situations, while the goal remains to create general systems that are adaptable to various tasks and environments.

Key Insights

Sergey Levine's team at Physical Intelligence is creating robotic foundation models designed to control any embodied system, aiming for general-purpose applicability similar to language models.
Humanoid robots, while popular, are just one potential form factor; the focus should be on developing general intelligence for various robotic systems, from humanoids to drone swarms.
Multimodal language models are essential for integrating common sense into robotic systems, allowing them to intelligently handle unusual situations.
Reinforcement learning enables robots to improve tasks through practice, enhancing their speed and robustness, with effective learning methods compensating for deficient sensing capabilities.

View all Invest Like the Best with Patrick O'Shaughnessy recaps