OpenAI drops GPT-5.4, Iran War fallout, The Mansion Section | Diet TBPN - TBPN Recap

Podcast: TBPN

Published: 2026-03-07

Duration: 31 min

Summary

The episode dives into the release of OpenAI's GPT-5.4, highlighting its improvements over previous models and discussing its implications for various sectors. Experts share their experiences and reflections on the rapid advancements in AI technology.

What Happened

In this episode, the hosts discuss the recent launch of OpenAI's GPT-5.4, with Tyler Cowan expressing satisfaction with the model's performance. Justin shares insights from a week of testing, noting that while GPT-5.4 offers a great blend of Opus and Codex features, it still falls short in eagerness and precision. The conversation shifts to personal reflections on AI advancements, with Bartos, a mathematician, revealing that GPT-5.4 has solved a complex problem he has been working on for two decades, marking a personal 'move 37' moment for him.

As the discussion unfolds, the hosts touch on the competitive landscape, noting that GPT-5.4 has outperformed previous models significantly. Brendan from Mercor highlights that GPT-5.4 is the best model they've ever tested, achieving a 50% mean score on their internal benchmark. The episode also includes anecdotes about the evolving capabilities of AI models, with examples of GPT-5.4 completing tasks much faster than other models like Claude. The hosts speculate on the future growth of AI in various sectors, emphasizing the potential for AI to surpass traditional consulting and banking firms.

Key Insights

GPT-5.4 combines features of Opus and Codex but lacks some eagerness and precision.
Bartos' experience illustrates the profound impact of AI on expert-level problem-solving.
GPT-5.4 has set new benchmarks in performance, surpassing previous AI models significantly.
The competitive landscape in AI is rapidly evolving, with major players like OpenAI and Anthropic scaling quickly.

Key Questions Answered

What are the main features of GPT-5.4?

The discussion highlights that GPT-5.4 is a blend of Opus and Codex, providing a fast and conversational experience with good instruction-following capabilities. However, it is noted that it lacks some eagerness found in Opus and precision associated with Codex.

How has GPT-5.4 impacted expert problem-solving?

Bartos, a top-tier mathematician, shared that GPT-5.4 solved a mathematical task he had been curating for 20 years, marking a significant moment for him. He referred to this as his personal 'move 37', suggesting that the model's capabilities have elevated his work to a new level.

What are the performance benchmarks for GPT-5.4?

Brendan from Mercor noted that GPT-5.4 achieved a 50% mean score on their internal benchmark, which is a significant improvement compared to previous models that struggled with basic tasks like editing an Excel sheet.

What does the future hold for AI in traditional industries?

The hosts speculate that AI models like GPT-5.4 will soon outperform traditional consulting firms, investment banks, and law firms. They highlight the rapid growth in AI capabilities and the increasing reliance on these models for complex tasks.

How does GPT-5.4 compare to other AI models?

The episode features anecdotes about GPT-5.4's superior performance. For example, one user mentioned that while GPT-5.4 completed an eight-phase coding project in one hour, another model, Claude, was still on phase two, showcasing GPT-5.4's efficiency.