Breaking the Memory Wall in the Age of Inference - The Data Exchange with Ben Lorica Recap

Podcast: The Data Exchange with Ben Lorica

Published: 2026-02-12

Duration: 46 min

Summary

Sid Shett, CEO of dMatrix, discusses the evolution of AI hardware, particularly focusing on inference and the critical role of memory in this landscape. He emphasizes the need for close integration of memory and compute to meet the demands of increasingly large AI models.

What Happened

In this episode, Ben Lorica speaks with Sid Shett, the founder and CEO of dMatrix, about the pressing need for dedicated hardware solutions for AI inference. Shett highlights a notable shift in the AI hardware landscape, moving from a focus on training models to the increasing necessity for efficient inference solutions as foundation models expand in size and complexity. He reflects on the early challenges of explaining what inference is to investors, which has changed dramatically with the rise of models like ChatGPT that have demonstrated the capabilities of inference in real-world applications.

Shett elaborates on the company's approach to addressing these challenges, emphasizing the importance of memory in AI computations. He notes that the traditional SRAM and high-bandwidth memory (HBM) have limitations, particularly in power consumption and cost when integrated with compute units. By focusing on SRAM for their cloud inference solutions, dMatrix aims to optimize the relationship between memory access and compute power to handle the demands of large-scale AI workloads effectively. This strategic pivot was made before the explosion of interest in AI models, positioning dMatrix advantageously within the market as the demand for inference capabilities surged.

Key Insights

The shift from training-focused AI hardware to inference solutions reflects the growing importance of real-time AI applications.
Successful AI chip development requires extensive hands-on experience and learning from failures in chip production.
Memory integration with compute is crucial for enhancing performance and efficiency in AI inference workloads.
The rapid advancement of AI models necessitates innovative approaches to hardware design, particularly in cloud environments.

Key Questions Answered

What are the main challenges in AI inference hardware?

Sid Shett highlights that the main challenges in AI inference hardware revolve around the increasing size and complexity of models, which require efficient memory access for optimal performance. As workloads grow, the need for a dedicated solution that integrates compute and memory becomes paramount, especially in cloud environments.

How did dMatrix differentiate itself in the AI hardware market?

dMatrix differentiated itself by focusing on cloud inference rather than entering the saturated edge inference market or competing in AI training, where established players like Nvidia dominate. By targeting the need for efficient memory and compute integration, dMatrix aims to fill a gap in the hardware solutions available for AI inference.

Why is memory integration critical for AI inference?

Memory integration is critical for AI inference because the models are increasingly dependent on rapid access to extensive datasets stored in memory. Sid Shett explains that traditional memory types, such as HBM, present challenges in terms of power and cost, making SRAM a more viable option for their compute integration strategy.

What was the state of AI models before the rise of ChatGPT?

Before the rise of ChatGPT, AI models were limited in user access and application despite their growing capabilities. Sid notes that while models like GPT-3 were impressive, the commercial viability and user base were still developing, necessitating a deeper understanding of inference in the data center.

What is the future trajectory of AI models according to Sid Shett?

Sid Shett anticipates that as AI models continue to evolve, the demand for more sophisticated inference solutions will only increase. He recognizes that the trend towards multimodal models and larger parameter sizes will require innovative hardware solutions capable of keeping up with these advancements.