New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman - Machine Learning Street Talk (MLST) Recap

Podcast: Machine Learning Street Talk (MLST)

Published: 2025-09-27

Duration: 1 hr 8 min

Guests: Jeremy Berman

Summary

Jeremy Berman discusses his innovative approach to solving the ARC challenge using natural language descriptions instead of code, achieving a top score on the ARC-AGI v2 leaderboard.

What Happened

Jeremy Berman, a research scientist at Reflection AI, has recently topped the ARC-AGI v2 public leaderboard with a score of 29.4%. Berman's approach differs from traditional methods by using natural language descriptions to solve the tasks, which he argues are more expressive than Python programs. This method allows for more creativity and adaptability in solving complex problems that are simple for humans but challenging for machines.

Berman discusses the limitations of current AI systems, particularly in their ability to synthesize new knowledge. He envisions an ideal system where language models can retain all prior knowledge while quickly adapting to new tasks. This would require more expressive programming and reasoning capabilities, which he believes are not fully realized with Python or current AI systems.

The episode delves into the trade-offs between using Python and natural language for program generation. While Python provides deterministic and verifiable outputs, natural language offers a broader range of expression, which Berman leverages to improve performance on ARC-AGI tasks. However, this approach also introduces challenges in verifying the correctness of solutions.

Berman emphasizes the importance of reasoning as a meta-skill in AI, crucial for achieving artificial general intelligence (AGI). He suggests that reasoning should be at the core of AI development, enabling systems to apply learned skills across various domains effectively.

The conversation touches on the need for AI to develop 'invention circuits' that allow for creativity and innovation beyond existing knowledge. Berman is optimistic about the potential of AI to achieve this through reinforcement learning and other techniques that mimic human deductive processes.

The episode concludes with Berman's thoughts on the future of AI research, highlighting the importance of creating environments that facilitate the development of reasoning and creativity in AI systems. He invites interested individuals to join Reflection AI as they work on building open intelligence models.

Key Insights

Reflection AI achieved a new top score of 29.4% on the ARC-AGI v2 public leaderboard by using natural language descriptions instead of traditional Python programs to solve tasks.
Current AI systems face limitations in synthesizing new knowledge, and an ideal system would retain all prior knowledge while adapting quickly to new tasks, requiring more expressive programming and reasoning capabilities.
Natural language offers a broader range of expression for program generation compared to Python, but it introduces challenges in verifying the correctness of solutions.
AI development should focus on reasoning as a core meta-skill, enabling systems to apply learned skills across various domains and develop 'invention circuits' for creativity and innovation.