The PhD students who became the judges of the AI industry - Equity Recap

Podcast: Equity

Published: 2026-03-18

Duration: 26 min

Guests: Anastasios Angelopoulos, Weilin Chang

Summary

PhD students Anastasios Angelopoulos and Weilin Chang turned their research project into Arena, a leading platform evaluating AI models in real-world scenarios, influencing industry benchmarks and standards.

What Happened

PhD students Anastasios Angelopoulos and Weilin Chang, initially focused on evaluating large language models (LLMs) during their time at UC Berkeley, have transformed their research into a major industry player with Arena. Arena, formerly known as LM Arena, is now the leading public leaderboard for AI models, impacting funding decisions, product launches, and PR activities. The platform uses a novel methodology that prioritizes real-world intelligence measurement over static benchmarks, collecting dynamic feedback from millions of users to create evolving leaderboards.

Arena's approach addresses the limitations of static benchmarks, which can lead to overfitting, by constantly updating with fresh data from its extensive user base. The platform ensures the neutrality and accuracy of evaluations by requiring that AI providers release the same models to the public as those tested on Arena. This structural neutrality is crucial, given that major companies like OpenAI, Google, and Meta rely on Arena's rankings.

Arena has rapidly grown, securing significant funding and a valuation of $1.7 billion in just seven months, with backers including A16Z, Kleiner Perkins, and Lightspeed. The platform's success is driven by its ability to provide reliable, real-world evaluations that are constantly refreshed by user interactions, preventing gaming of the system through optimized variants.

To maintain credibility and prevent abuse, Arena employs a dedicated team to ensure real, verified usage on the platform, using sophisticated tools to detect and prevent fraudulent or biased activity by AI companies. This commitment to neutrality and accuracy helps uphold Arena's reputation as a trusted source for AI evaluation.

Arena is expanding its offerings to include agentic capabilities, evaluating AI models on tasks such as web application building, tool use, and coding across various languages. This expansion reflects the evolving nature of AI capabilities and Arena's role in providing comprehensive evaluations of these frontier technologies.

The platform's data-driven approach also supports enterprises in making informed decisions regarding AI model selection and upgrades, offering interactive analytical tools tailored to specific industry needs. Arena's extensive dataset, comprising millions of user interactions, positions it as a unique and indispensable resource in the AI landscape.

Despite its rapid rise, Arena remains committed to open-source principles, regularly releasing human preference data to foster transparency and community trust. This openness, combined with the platform's rigorous evaluation standards, ensures that Arena continues to shape AI development in meaningful ways.

Key Insights

Arena, formerly LM Arena, is the leading public leaderboard for AI models, influencing funding decisions and product launches by prioritizing real-world intelligence measurement over static benchmarks.
Arena's valuation reached $1.7 billion within seven months, backed by investors like A16Z, Kleiner Perkins, and Lightspeed, due to its reliable, user-driven AI model evaluations.
To ensure accuracy, Arena requires AI providers to release the same models to the public as those tested on the platform, maintaining neutrality for major companies like OpenAI, Google, and Meta.
Arena is expanding its evaluations to include agentic capabilities, assessing AI models on tasks such as web application building and coding, reflecting the evolving nature of AI technologies.