The Evolution of Reasoning in Small Language Models with Yejin Choi - The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) Recap

Podcast: The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Published: 2026-01-29

Duration: 1 hr 6 min

Summary

Yejin Choi discusses her research on improving reasoning in small language models (SLMs), emphasizing the need for better data and novel architectures to democratize AI. She argues that investing in smaller models could unlock significant capabilities, challenging the current trend of focusing heavily on large language models (LLMs).

What Happened

In this episode, host Sam Charrington welcomes back Yejin Choi, a professor at Stanford University, to discuss her latest work on small language models and reasoning. Choi reflects on her previous research in common sense knowledge and natural language generation, noting that her focus has shifted towards enhancing the reasoning capabilities of smaller models. She highlights the issue of homogeneity in model outputs, pointing out that even with variations in parameters, models like LAMA, ChatGPT, and DeepSIC demonstrate strikingly similar behaviors.

Choi emphasizes the importance of democratizing generative AI, arguing that small language models should be accessible to a wider audience beyond just companies with significant computational resources. She believes that with increased investment in smaller models, researchers could uncover exciting new capabilities. Choi also critiques the current data-centric approach to teaching AI, suggesting the need for more efficient methods that require less data. She explores the potential of new architectures and high-quality data sourced from experts to improve the performance of small models, rather than solely relying on the traditional large model compression techniques.

Key Insights

Small language models have the potential to democratize access to AI technology.
Homogeneity in outputs is a common issue across existing models, indicating a need for diversity in model training.
Investing in high-quality data and new model architectures could significantly enhance the capabilities of smaller models.
A shift from data-centric methods to more efficient approaches could lead to better learning outcomes for AI.

Key Questions Answered

What are the limitations of current language models?

Choi discusses how even with attempts to vary model parameters, there is a notable intra-modal and inter-modal homogeneity in the outputs of models like LAMA, ChatGPT, and DeepSIC. This indicates that the models are not as diverse as one might expect, which can limit their effectiveness in handling open-ended questions.

Why is Yejin Choi focused on small language models?

Choi's interest in small language models stems from her mission to democratize generative AI, making it accessible beyond large companies. She emphasizes that smaller models can still be meaningful and impactful, arguing that with more investment in this area, researchers could unlock exciting new capabilities.

What alternative approaches to model training does Choi suggest?

Choi highlights several potential approaches to improving small models, including the use of novel architectures and the incorporation of high-quality, expert-curated data. She notes that while compressing larger models can be effective, exploring these alternative methods could yield better outcomes for smaller models.

What role does data play in the effectiveness of AI models?

Choi emphasizes the importance of high-quality data in training AI models, suggesting that the current reliance on internet-sourced data may not be sufficient. She proposes that curated datasets, created by experts for specific purposes, can significantly enhance how small models learn and reason.

How can we improve reasoning capabilities in AI?

Choi advocates for a shift in focus towards developing more effective reasoning capabilities in small language models. This includes exploring new architectures that combine established techniques with innovative approaches, as well as seeking out higher quality data that can better inform the models' decision-making processes.