Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU) - Machine Learning Street Talk (MLST) Recap

Podcast: Machine Learning Street Talk (MLST)

Published: 2025-09-19

Duration: 2 hr 4 min

Guests: Andrew Gordon Wilson

Summary

Deep learning, while often seen as mysterious, can be understood through principles of soft inductive biases and generalization frameworks. Prof. Andrew Wilson explores how larger models exhibit simplicity biases, challenging the traditional bias-variance trade-off.

What Happened

Andrew Wilson argues that deep learning's perceived mystery can be unpacked by examining soft inductive biases and existing generalization frameworks. He highlights that while deep learning is universally applicable, it is not entirely universal, emphasizing its effectiveness in representation learning and unique optimization properties like mode connectivity.

Wilson challenges conventional wisdom on generalization and model construction, suggesting that larger models can exhibit stronger simplicity biases than smaller ones. This perspective helps to demystify phenomena like double descent and overparameterization, offering a principled approach to model building.

The episode explores the misconception of the bias-variance trade-off, proposing that large neural networks achieve low bias and low variance simultaneously. Wilson emphasizes that real-world data is not uniformly random, and larger models' expressiveness combined with simplicity biases can lead to better generalization.

Further, Wilson discusses the underappreciated role of Bayesian methods in deep learning, advocating for representing epistemic uncertainty through Bayesian marginalization. He outlines how this approach leads to better model generalization by integrating over numerous plausible explanations.

Wilson also delves into the challenges of aligning model assumptions with real-world data, questioning the validity of no free lunch theorems in practical scenarios. He highlights how large models' simplicity biases naturally emerge from their structure, offering insights into why they generalize well.

The episode concludes with Wilson's vision for AI systems capable of discovering new scientific theories, moving beyond current applications to provide deeper insights into data. He emphasizes the potential for AI to address complex scientific challenges, advocating for a focus on building models that can uncover new scientific knowledge.

Key Insights

Deep learning's effectiveness is attributed to its unique optimization properties like mode connectivity and its ability to learn representations, but it is not universally applicable across all domains.
Larger neural networks can exhibit stronger simplicity biases than smaller ones, challenging the conventional wisdom that smaller models generalize better.
Contrary to the traditional bias-variance trade-off, large neural networks can achieve both low bias and low variance, which enhances their generalization capabilities.
Bayesian methods in deep learning can improve model generalization by representing epistemic uncertainty through Bayesian marginalization, integrating over many plausible explanations.