The PhD students who became the judges of the AI industry - Equity Recap

Podcast: Equity

Published: 2026-03-18

Guests: Anastasios Angelopoulos, Weilin Chang

What Happened

Anastasios Angelopoulos and Weilin Chang, co-founders of Arena, discussed how their platform has become a significant player in evaluating AI models. Arena, previously known as LM Arena, emerged from a research project at UC Berkeley focused on comparing large language models (LLMs) in real-world scenarios. Arena developed a unique system where users can compare AI responses and provide feedback, creating dynamic leaderboards that influence AI development and investment decisions.

The founders shared that Arena's platform has grown significantly, securing $100 million in seed funding and a $150 million Series A, leading to a valuation of $1.7 billion. Backers include major players like A16Z, Kleiner Perkins, and Google. Arena's approach differs from traditional static benchmarks by using real-world data from millions of users to ensure AI models are evaluated based on practical utility rather than memorized test answers.

Arena's platform has a diverse user base, with 28% involved in coding and others in fields like medical and legal tasks. This diversity is crucial for providing a comprehensive evaluation of AI capabilities. The platform's live data collection helps maintain relevance and prevents overfitting, offering a more accurate measure of AI performance over static tests.

Anastasios Angelopoulos addressed concerns about neutrality and potential biases, explaining that Arena ensures models being tested are the same as those available to the general public. The platform's structural neutrality is maintained by not allowing financial influence over leaderboard rankings.

Arena is expanding its services to help enterprises evaluate AI models for specific use cases. This involves offering analytical tools that provide insights into model performance across various domains, allowing companies to make informed decisions about which AI tools to integrate into their operations.

Weilin Chang highlighted the platform's commitment to preventing fraud and abuse by employing a dedicated team to monitor and analyze user interactions. This ensures that the data driving Arena's leaderboards reflects genuine user input and not manipulated feedback from companies or automated bots.

The co-founders discussed Arena's future plans, including developing benchmarks for agent capabilities and expanding into more niche industries. They are also focused on enhancing the platform's ability to evaluate AI models' performance in tasks like coding, legal applications, and multimedia editing without relying solely on human evaluators.

Key Insights

Arena's platform uses real-world feedback from millions of users to evaluate AI models, making it more dynamic and accurate than static benchmarks.
The platform has a broad user base, including 28% involved in coding and others in fields like medical and legal tasks, ensuring diverse input for AI evaluations.
Arena has secured significant funding, leading to a $1.7 billion valuation, with backers like A16Z and Google, highlighting its influence in the AI industry.
Arena's commitment to neutrality is maintained by structural safeguards, ensuring that financial influence does not affect leaderboard rankings.