Richard Sutton – Father of RL thinks LLMs are a dead-end - Dwarkesh Podcast Recap

Podcast: Dwarkesh Podcast

Published: 2025-09-26

Duration: 1 hr 6 min

Summary

Richard Sutton argues that large language models (LLMs) lack true intelligence as they don't possess goals or a proper understanding of the world, instead merely mimicking human behavior. He emphasizes the importance of reinforcement learning (RL) as a foundational approach to AI that encourages learning from experience.

What Happened

In this episode, Dwarkesh engages with Richard Sutton, a pioneer in reinforcement learning and this year's Turing Award recipient. Sutton expresses his skepticism towards the rise of large language models (LLMs), pointing out that while they can generate text based on vast datasets, they fundamentally lack the ability to understand or predict the world in a meaningful way. He argues that LLMs focus on imitation rather than genuine intelligence, reducing their capability to simply mimicking human behavior without the necessary learning from real-world experiences.

Sutton contrasts LLMs with reinforcement learning, which he views as a more basic form of AI focused on understanding the world through interaction and experience. He argues that true intelligence should involve having goals, which LLMs do not possess. Instead, they are engaged in a process of predicting the next token in a sequence rather than making decisions that affect the external world. Sutton's critique highlights the deficiencies in LLMs regarding their lack of ground truth and ability to learn continually from their environment, which he considers essential for any intelligent system.

Key Insights

Reinforcement learning is foundational to true AI understanding.
Large language models primarily mimic human behavior without genuine intelligence.
Goals are essential for defining intelligent behavior in AI systems.
LLMs lack the ability to learn from experience and have no ground truth.

Key Questions Answered

What are the limitations of large language models according to Richard Sutton?

Richard Sutton argues that large language models (LLMs) lack true intelligence because they do not possess goals or a robust understanding of the world. Instead of learning from interactions and experiences, LLMs simply mimic human behavior based on the data they were trained on. Sutton emphasizes that while LLMs can generate text, they do not have the ability to predict what will happen in the real world, which is a crucial aspect of intelligence.

How does Sutton define intelligence in AI?

Sutton defines intelligence as the ability to achieve goals, referencing John McCarthy's perspective that intelligence is the computational part of achieving those goals. He stresses that without goals, a system cannot be considered intelligent; it merely behaves according to its programming. For Sutton, having a goal is essential for any meaningful measure of intelligence.

What does Sutton think about the potential of combining RL with LLMs?

Sutton expresses skepticism about the productivity of applying reinforcement learning on top of large language models. He believes that while LLMs can be trained to solve specific problems, they fundamentally lack the capability to learn from their environment and adjust their understanding based on experiences, which he views as a critical component of true intelligence.

Why does Sutton believe LLMs do not have a world model?

Sutton challenges the notion that LLMs possess a world model. He states that they can predict what a person might say next but do not have a substantive prediction about what will happen in the world based on their actions. This lack of predictive capability means they cannot learn from outcomes, which is essential for developing a true understanding of the world.

What is the significance of ground truth in AI according to Sutton?

Ground truth is vital in Sutton's framework for assessing AI systems. He argues that without ground truth, which provides a basis for what is considered correct or valuable, LLMs cannot have prior knowledge that leads to actual knowledge. In contrast, reinforcement learning systems can define the right actions through rewards, enabling a more accurate path to learning and understanding.