The AI SRE Hype and How to Get it Right with Yotam Yemini, CEO of Causely - Modern CTO Recap

Podcast: Modern CTO

Published: 2025-10-27

Duration: 39 min

Summary

In this episode, Yotam Yemini discusses the current hype surrounding AI Site Reliability Engineers (SREs) and emphasizes the importance of understanding their true role in engineering reliability within systems. He shares insights on both effective and ineffective applications of AI in SRE functions.

What Happened

Yotam Yemini, the CEO of Causely, joins the podcast to elucidate the buzz around AI SREs. He begins by explaining that the term SRE, which stands for Site Reliability Engineering, was originally coined at Google to encapsulate the concept of engineers being tasked with creating operational functions. Yemini stresses that the core aim of an SRE should be to engineer reliability into systems, rather than merely troubleshooting issues.

He discusses various applications of AI in the realm of SRE, highlighting that while some uses, like summarizing postmortems or translating alerts, are beneficial, many companies are misapplying AI by assuming language models can autonomously solve complex problems. Yemini points out that AI struggles with emergent behaviors and counterfactuals, making it ill-suited for certain SRE tasks, which often require nuanced understanding and creativity. He humorously illustrates this by comparing AI hallucinations to those of small children, noting that both can lead to nonsensical conclusions.

The conversation shifts to Yemini's background, where he reveals his journey from studying psychology to entering the tech field. He emphasizes the relevance of his industrial-organizational psychology background, particularly in understanding workplace dynamics. This foundation has informed his approach to building Causely, which aims to leverage both technology and human understanding in enhancing operational reliability.

Key Insights

Key Questions Answered

What does AI SRE stand for?

AI SRE stands for Artificial Intelligence Site Reliability Engineering. Yemini explains that the term is currently being used by many companies, but it often misses the essence of what an SRE is meant to be. Originally coined at Google, SRE was about asking engineers to build operational functions with a focus on reliability rather than just troubleshooting.

What are some effective uses of AI in SRE?

Yemini highlights that effective applications of AI in SRE include tasks such as summarizing postmortems and translating alerts. For instance, he mentions that using a language model to analyze incident data and create a concise summary can be valuable. Similarly, having a chatbot that translates error codes for developers, such as those working on Salesforce, can help bridge knowledge gaps.

What challenges do AI models face in SRE applications?

One significant challenge mentioned by Yemini is that AI models struggle with emergent behaviors and counterfactuals. Since SRE work often involves dealing with novel system behaviors, relying solely on language models can lead to inaccuracies. Yemini points out that while AI can match patterns, it cannot always comprehend the complexities of real-world incidents and behaviors.

How did Yotam Yemini transition from psychology to technology?

Yemini's transition began with his studies in psychology, specifically industrial-organizational psychology, where he researched workplace dynamics. After working as a college basketball coach, he noticed the shift from analog to digital methods in sports, which inspired him to pursue a career in technology and eventually led to the founding of Causely.

What is the core philosophy of an SRE according to Yotam Yemini?

Yemini emphasizes that the core philosophy of an SRE is to automate themselves out of a job, focusing on engineering reliability into systems rather than merely responding to issues. This perspective underscores the intention behind the SRE role, which is to create systems that operate smoothly and efficiently without constant oversight.