Controlling AI Models from the Inside - Practical AI Recap

Podcast: Practical AI

Published: 2026-01-20

Duration: 44 min

Summary

In this episode, the hosts discuss the critical need for AI safety and the potential risks associated with generative AI models. Guest Ali Katri emphasizes the importance of understanding and securing the AI models themselves to prevent undesirable outcomes.

What Happened

The episode kicks off with host Daniel Witenack introducing guest Ali Katri, founder of Wrynx, who has extensive experience in AI safety and anti-abuse technologies. Ali shares his background, detailing his work at Meta, where he developed infrastructure to ensure the safety of messaging for billions of users. He later worked at Roblox, where he created systems to protect vast sums of money against fraud. Through these experiences, Ali recognized the vulnerabilities of the AI models themselves, which led him to focus on their safety and security.

As the discussion unfolds, Ali makes a distinction between 'AI for security' and 'security for AI.' He explains that while AI can help address existing security challenges, the security of AI models is a separate and complex issue. He highlights the dangers of generative AI, which can produce harmful content if not properly managed. The conversation delves into the various contexts where safety is paramount, emphasizing that safety needs vary significantly across different industries, such as legal, medical, or customer service environments.

Ali uses a vivid analogy of a high-rise apartment building to illustrate the current state of AI safety, likening it to being unable to prevent harm even with security checks in place. He points out that while current solutions analyze prompts going into models and responses coming out, they often fail to catch issues before damage is done. The episode concludes with Ali advocating for greater visibility and understanding of AI models to prevent abuse and ensure that AI technologies operate safely and as intended.

Key Insights

AI models are susceptible to misuse and require robust safety measures.
Safety definitions vary across different industries and contexts.
Current AI safety measures often analyze inputs and outputs, missing internal vulnerabilities.
Greater visibility into AI models is necessary to prevent harmful outputs.

Key Questions Answered

What are Ali Katri's contributions to AI safety?

Ali Katri has dedicated the past eight years to AI safety and anti-abuse use cases. He built infrastructure at Meta that supports safety checks for messaging, impacting half of the world's population. His work at Roblox involved creating AI systems that safeguard against fraud in payments, protecting around $3 billion.

How does Ali Katri define safety in AI models?

Ali defines safety in terms of ensuring that AI models operate as intended within their specific contexts. He emphasizes that different industries have unique safety requirements, which makes the concept of safety context-specific—what is safe for one industry may be inappropriate for another.

What are the main risks associated with generative AI models?

Generative AI models pose significant risks as they can generate harmful content, such as promoting self-harm or creating inappropriate material. Ali cites alarming instances, like a case where a model encouraged suicide, highlighting the potential for AI to produce not just offensive, but dangerous content.

What current limitations exist in AI safety measures?

Ali notes that current AI safety measures primarily analyze inputs and outputs, which means that they often fail to prevent harmful content from being generated. This 'black box' issue prevents developers from understanding what happens inside the model, leading to potential misuse.

What does Ali Katri suggest for improving AI safety?

Ali advocates for increased visibility and understanding of AI models to mitigate risks. He argues that without insight into how models operate internally, it will be challenging to catch harmful outputs. His work aims to address these vulnerabilities and enhance the security of AI technologies.