Optimizing Agent Behavior in Production with Gideon Mendels - Software Engineering Daily Recap

Podcast: Software Engineering Daily

Published: 2026-02-17

Duration: 52 min

Summary

In this episode, Gideon Mendels discusses the complexities of optimizing LLM-powered systems in production and the need for new evaluation tools tailored for these non-deterministic models. He introduces OPIC, Comet's open-source platform designed for evaluation, optimization, and observability of LLM agents.

What Happened

As large language model (LLM) systems become increasingly integrated into production environments, teams face unique challenges that traditional software practices don't adequately address. Non-deterministic behavior of these models complicates testing, failure analysis, and confidence in updates. Gideon Mendels, co-founder and CEO of Comet, highlights the necessity for new evaluation tooling specifically created for LLMs, leading to the development of OPIC, an open-source platform that emphasizes evaluation, optimization, and observability for LLM agents.

Mendels shares his journey from a software engineering background to machine learning, reflecting on his time at Google where he worked on language models and hate speech detection. He notes that, despite the fast-moving nature of AI, many teams still grapple with the same foundational issues encountered in software engineering. The conversation reveals that while developing agents shares similarities with traditional software development, the specifics of working with LLMs introduce a different set of complexities, particularly regarding control over model parameters and prompts. Mendels emphasizes how OPIC aims to bring rigor to agent-based systems by treating various components as optimizable parts of the workflow.

Key Insights

LLM systems present unique challenges in production, requiring new evaluation tools.
OPIC is an open-source platform designed to optimize LLM agents.
Mendels' background highlights the intersection of software engineering and machine learning.
Traditional engineering principles need adaptation to address non-deterministic behavior in LLMs.

Key Questions Answered

What are the challenges faced by teams using LLMs in production?

Teams encounter difficulties stemming from the non-deterministic nature of LLMs, which complicate testing, failure reasoning, and updating processes. The traditional software development practices often do not align with the unique demands posed by these AI systems, creating a need for tailored evaluation tools.

What is OPIC and how does it assist developers?

OPIC is an open-source platform launched by Comet that focuses on evaluation, optimization, and observability for LLM agents. It aims to bring the rigor of traditional engineering practices to the development of agent-based systems, treating prompts, tools, and workflows as optimizable components.

How does Gideon Mendels' background influence his work at Comet?

Mendels transitioned from software engineering to machine learning, bringing valuable insights from both fields. His experiences at Google, particularly in language models and hate speech detection, shaped his understanding of the challenges in ML workflows, leading to the inception of Comet and its focus on model experiment tracking.

What similarities exist between traditional software development and LLM development?

Both domains require careful management of variables and components to achieve optimal results. While traditional software often relies on well-defined algorithms and data sets, LLM development involves navigating parameters like system prompts and tool calls, which can vary widely and affect outputs.

Why is evaluation considered a missing foundation for AI teams?

Mendels emphasizes that many AI teams overlook the importance of robust evaluation processes, which are crucial for ensuring model reliability and performance. Without proper evaluations, teams may struggle to understand their model's capabilities, leading to inconsistent results and missed opportunities for optimization.