#228 - GPT 5.2, Scaling Agents, Weird Generalization - Last Week in AI Recap

Podcast: Last Week in AI

Published: 2025-12-17

Duration: 1 hr 27 min

Summary

The episode dives into the announcement of GPT-5.2, its competitive benchmarks, and implications for AI's role in automating white-collar jobs, while also touching on various AI advancements and research topics.

What Happened

In this episode, hosts Andrey Karenkov and Jeremy Harris discuss the recent announcement of GPT-5.2, which is positioned to reclaim OpenAI's leadership in the AI landscape. The new model is noted for its improved performance and competitive benchmarks, suggesting a significant leap in capabilities. Key highlights include GPT-5.2's ability to match or outperform top industry professionals in various knowledge work tasks, as evidenced by the GDPVal evaluation where it produced outputs at over 11 times the speed and less than 1% of the cost of expert professionals.

The conversation also touches on the unique features of GPT-5.2, such as its higher operational costs compared to its predecessor, GPT-5.1, and its different knowledge cutoff dates. The hosts elaborate on how these factors suggest OpenAI is continuously refining and training its models. Furthermore, they discuss the SWE Bench Pro benchmark, where GPT-5.2 scored impressively, indicating its strong performance relative to other top models, like Claude 4.5. Overall, the episode encapsulates the excitement surrounding GPT-5.2's launch and its potential impact on the AI field.

Key Insights

Key Questions Answered

What are the key features of GPT-5.2?

GPT-5.2 has been announced with notable features like improved performance benchmarks and a competitive edge over previous models. It's designed to enhance OpenAI's position in the AI landscape, boasting a new operational cost structure that is higher than GPT-5.1, with input pricing at 1.75 and output costs increasing by about 40%. These changes suggest a strategic shift in how OpenAI positions its models in terms of capability and economic feasibility.

How does GPT-5.2 compare to industry professionals?

The GDPVal evaluation indicates that GPT-5.2 is capable of matching or exceeding the performance of top industry professionals in various knowledge work tasks. It reportedly produces outputs for these tasks at over 11 times the speed and less than 1% of the cost of human experts, showcasing its potential to significantly impact the workforce and automate white-collar jobs.

What is the significance of the SWE Bench Pro benchmark?

SWE Bench Pro is a more challenging benchmark than its predecessors, assessing AI models on a stricter scale. GPT-5.2's performance in this benchmark, scoring 55.6%, places it at the top tier of AI capabilities, surpassing competitors such as Claude 4.5. This performance suggests that GPT-5.2 could play a crucial role in advancing AI applications across various industries.

What does the change in knowledge cutoffs mean for GPT-5.2?

The difference in knowledge cutoffs between GPT-5.1 and GPT-5.2, with the latter having a cutoff of August 31 compared to September 30 for the former, implies that OpenAI is actively training and refining its models. This continual training might be aimed at enhancing the model's relevancy and accuracy in an ever-evolving information landscape, indicating a commitment to keeping their AI models up-to-date.

What are the implications of AI automation on white-collar jobs?

The advancements showcased by GPT-5.2, especially in the context of tasks traditionally performed by human professionals, raise important questions about the future of white-collar jobs. With AI systems capable of performing these tasks efficiently and at a fraction of the cost, businesses may increasingly adopt these technologies, potentially leading to significant changes in employment patterns and job roles in knowledge-based industries.