The First Mechanistic Interpretability Frontier Lab - Myra Deng & Mark Bissell of Goodfire AI - Latent Space: The AI Engineer Podcast Recap

Podcast: Latent Space: The AI Engineer Podcast

Published: 2026-02-06

Duration: 1 hr 8 min

Guests: Myra Deng, Mark Bissell

Summary

Goodfire AI is pioneering the application of mechanistic interpretability in AI, aiming to make AI models safer and more powerful by understanding their internal workings. This episode delves into their approach, recent fundraising success, and the implications of their work in fields like healthcare and beyond.

What Happened

Goodfire AI's mission is to utilize interpretability to understand and design AI models, believing this will unlock the next generation of powerful AI. The company recently announced a significant $150 million Series B funding round, achieving a unicorn status with a valuation of $1.25 billion. Mark Bissell and Myra Deng, key figures from Goodfire, discuss their diverse roles, highlighting a mix of research, engineering, and product development within the growing company.

The episode provides a comprehensive overview of interpretability, with Mark and Myra explaining their broad definition that encompasses applying these techniques in high-stakes industries. They emphasize the importance of taking interpretability from research to real-world applications, with a particular focus on production scenarios. The conversation touches on the historical context of interpretability in AI, noting the rapid advancements in the field.

Goodfire's work in healthcare is noted, particularly its collaborations with institutions like Mayo Clinic and Arc Institute. These partnerships aim to leverage AI for discovering novel biomarkers in diseases such as Alzheimer's, showcasing the potential of AI to accelerate scientific discovery. The discussion also highlights the challenges and opportunities in deploying AI models in real-world healthcare settings, emphasizing the need for transparency and precision.

The episode features a demonstration of Goodfire's steering techniques on a trillion-parameter model, showcasing real-time adjustments in model behavior. This demonstration underscores the scalability of their methods and the potential for practical applications in large-scale AI systems.

The hosts and guests explore the theoretical aspects of mechanistic interpretability, including its role in understanding AI behaviors like hallucinations. They discuss the potential of interpretability in improving model design and reducing unintended behaviors, pointing to the broader implications for AI safety and alignment.

Finally, the conversation touches on the community and ecosystem around interpretability research, highlighting the collaborative nature of the field and the opportunities for new researchers to contribute. Goodfire's approach to scaling their techniques across various scientific domains is seen as a promising direction for the future of AI research.

Key Insights