#238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

Last Week in AI Podcast Recap

Published: 2026-03-26T06:00:00.000Z

What Happened

OpenAI's release of GPT5.4 mini and nano models marks a significant step in efficiency and speed. GPT5.4 mini closely rivals GPT5.4 in benchmarks but is more than twice as fast, while the nano version is smaller and quicker, though less robust in benchmarks. The cost of these models is notably higher, with GPT5.4 mini costing three times more than its predecessor.

OpenAI is shifting its focus from a broad array of projects to more productivity and business applications. This strategic pivot comes as OpenAI lags behind Anthropic, which holds over 70% of the enterprise market share. Fiji Simo, the head of applications, emphasizes the importance of focusing on productivity to regain competitive ground.

Mistral's new release, the Mamba 3 model, presents significant advancements in sequence modeling. Utilizing state space principles, Mamba 3 outperforms transformers in accuracy while achieving the same perplexity as Mamba 2 with half the state size. This model enhances GPU utilization and speeds up inference, promising cost efficiencies.

Nvidia's announcements at GTC include the Nemaclaw stack for OpenClaw agents and the DLSS5 machine learning-based upscaling for graphics. While DLSS5 received mixed reactions, it signifies a major innovation in gaming graphics. Additionally, Nvidia's Groq free language processing unit promises high throughput time for rapid inference.

Meta faces internal challenges, delaying its next AI model, codename Avocado, due to training difficulties. The company is considering licensing Google's Gemini AI technology temporarily. Internal tensions are highlighted by clashes between Alexander Wang and product leaders over AI model strategies.

ByteDance's collaboration with Aulani Cloud in Malaysia allows it to amass significant computing power using Nvidia AI chips. This setup, consisting of 36,000 B200 chips, forms a 60-megawatt cluster, demonstrating ByteDance's strategic expansion outside of China amidst export controls.

Attention Residuals, a new approach proposed in a recent paper, aims to replace fixed accumulation of information with softmax attention. This method reduces memory overhead through block attention residuals. Meanwhile, Anthropic's Bloom framework automates the evaluation of AI behaviors, enhancing adherence to constitutional AI guidelines.

Microsoft is restructuring its AI division, with Mustafa Suleiman, former DeepMind co-founder, now focusing on developing a Frontier foundation language model. Despite Copilot's 150 million active users, it trails behind Gemini and ChatGPT in user numbers, prompting strategic shifts to improve Microsoft's AI capabilities.

Key Insights

OpenAI's GPT5.4 mini model is nearly as effective as GPT5.4, boasting over twice the speed, though it comes at a higher cost. The nano variant, while faster, does not match its peers in benchmark strength.
Mistral's Mamba 3 model introduces a novel approach to sequence modeling, outperforming transformers with improved GPU utilization and faster inference. The model uses complex numbers for better internal state tracking.
ByteDance's strategic deployment in Malaysia involves 36,000 Nvidia B200 chips, forming a 60-megawatt cluster. This setup allows the company to bypass export controls, leveraging Nvidia's hardware outside China.
Meta's internal discord over AI development strategies has delayed its Avocado model release. The company is exploring temporary licensing of Google's Gemini AI technology while resolving these challenges.

View all Last Week in AI recaps