#238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

Last Week in AI Podcast Recap

Published:

What Happened

OpenAI's release of GPT5.4 mini and nano models marks a significant step in efficiency and speed. GPT5.4 mini closely rivals GPT5.4 in benchmarks but is more than twice as fast, while the nano version is smaller and quicker, though less robust in benchmarks. The cost of these models is notably higher, with GPT5.4 mini costing three times more than its predecessor.

OpenAI is shifting its focus from a broad array of projects to more productivity and business applications. This strategic pivot comes as OpenAI lags behind Anthropic, which holds over 70% of the enterprise market share. Fiji Simo, the head of applications, emphasizes the importance of focusing on productivity to regain competitive ground.

Mistral's new release, the Mamba 3 model, presents significant advancements in sequence modeling. Utilizing state space principles, Mamba 3 outperforms transformers in accuracy while achieving the same perplexity as Mamba 2 with half the state size. This model enhances GPU utilization and speeds up inference, promising cost efficiencies.

Nvidia's announcements at GTC include the Nemaclaw stack for OpenClaw agents and the DLSS5 machine learning-based upscaling for graphics. While DLSS5 received mixed reactions, it signifies a major innovation in gaming graphics. Additionally, Nvidia's Groq free language processing unit promises high throughput time for rapid inference.

Meta faces internal challenges, delaying its next AI model, codename Avocado, due to training difficulties. The company is considering licensing Google's Gemini AI technology temporarily. Internal tensions are highlighted by clashes between Alexander Wang and product leaders over AI model strategies.

ByteDance's collaboration with Aulani Cloud in Malaysia allows it to amass significant computing power using Nvidia AI chips. This setup, consisting of 36,000 B200 chips, forms a 60-megawatt cluster, demonstrating ByteDance's strategic expansion outside of China amidst export controls.

Attention Residuals, a new approach proposed in a recent paper, aims to replace fixed accumulation of information with softmax attention. This method reduces memory overhead through block attention residuals. Meanwhile, Anthropic's Bloom framework automates the evaluation of AI behaviors, enhancing adherence to constitutional AI guidelines.

Microsoft is restructuring its AI division, with Mustafa Suleiman, former DeepMind co-founder, now focusing on developing a Frontier foundation language model. Despite Copilot's 150 million active users, it trails behind Gemini and ChatGPT in user numbers, prompting strategic shifts to improve Microsoft's AI capabilities.

Key Insights

View all Last Week in AI recaps