The Hidden Challenges of Running AI at Scale in Production - The Data Exchange with Ben Lorica Recap

Podcast: The Data Exchange with Ben Lorica

Published: 2026-03-12

Duration: 32 min

Guests: Chen Goldberg

Summary

This episode examines the complexities of deploying AI at scale in production environments, highlighting challenges and opportunities for differentiation through specialized cloud platforms.

What Happened

Chen Goldberg, SVP of Engineering at CoreWeave, discusses the rapid transition of AI from experimentation to production and the unique challenges this brings, such as identifying unique assets and choosing the right cloud partners. She emphasizes that many companies outside Silicon Valley are actively exploring how AI can enhance their existing assets and customer experiences.

Goldberg highlights the importance of choosing specialized tools for AI infrastructure, noting that companies doing training or inference for real production workloads prioritize security, reliability, and performance. CoreWeave positions itself as a partner rather than just a vendor, offering expertise and infrastructure solutions tailored to AI needs.

The conversation touches on the evolution of cloud computing, comparing it to the early days of SaaS and cloud-native development. Goldberg shares how AI workloads have changed assumptions about resource management, necessitating new orchestration strategies and infrastructure solutions to handle the complexities of multi-node environments.

Goldberg introduces CoreWeave's innovative tools like Arena, which provides real infrastructure for testing AI workloads, allowing companies to benchmark and make informed decisions about their infrastructure needs. She also discusses the importance of improving 'good put,' which refers to maximizing the effective use of GPU time.

The discussion includes the role of AI in optimizing infrastructure operations, with CoreWeave using AI to analyze telemetry data for better decision-making. Goldberg also addresses the increasing use of reinforcement learning beyond typical tech companies, indicating its growing accessibility and potential.

The episode covers strategic partnerships, particularly with NVIDIA, and the consideration of other GPU alternatives in the context of supply chain challenges. Goldberg expresses confidence in NVIDIA's current ecosystem and software maturity.

Finally, Goldberg emphasizes the need for companies to experiment with AI technologies to avoid falling behind. She warns of potential technical debt and the importance of balancing innovation with sustainable growth, especially as AI technologies continue to rapidly evolve.

Key Insights

Key Questions Answered

What does Chen Goldberg discuss about AI infrastructure on The Data Exchange podcast?

Chen Goldberg discusses the transition of AI from experimentation to production, the challenges of deploying AI at scale, and the importance of specialized cloud platforms like CoreWeave to optimize infrastructure for AI workloads.

How does CoreWeave differentiate itself as an AI cloud platform?

CoreWeave differentiates itself by offering specialized infrastructure solutions tailored to AI workloads, focusing on security, reliability, and performance, and positioning itself as a partner rather than just a vendor.

What role does NVIDIA play in CoreWeave's AI infrastructure strategy?

NVIDIA serves as both an investor and a key supplier for CoreWeave, providing a mature ecosystem and software stack that supports CoreWeave's focus on optimizing AI infrastructure at scale.