A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
Super Data Science: ML & AI Podcast with Jon Krohn Podcast Recap
Published:
Duration: 10 min
Summary
The episode examines Pathway's BDH architecture which outperformed transformer-based models on solving extreme Sudoku puzzles. It highlights the architecture's ability to reason under constraints, offering insights into potential AI applications beyond transformers.
What Happened
Pathway's BDH architecture achieved a 97.4% accuracy rate on the Sudoku Extreme benchmark, which consists of the hardest Sudoku puzzles, while transformer-based models scored nearly zero percent. This result highlights a stark performance difference and suggests a fundamental weakness in current large language models (LLMs) when applied to constraint satisfaction problems.
Sudoku serves as an ideal test for AI models because it requires solving a constraint satisfaction problem, where numbers must satisfy multiple conditions simultaneously. The inability of transformers to solve Sudoku efficiently is attributed to their token-by-token processing and limited internal state, which hampers their reasoning capabilities in non-textual contexts.
BDH architecture, which stands for Baby Dragon Hatchling, is designed as a native reasoning model. It utilizes a larger latent reasoning space, allowing the model to internalize reasoning without converting every thought into text. This capability is likened to a Chess Grandmaster navigating games without verbalizing each move.
BDH employs sparse positive activations, activating only about 5% of its artificial neurons at any time, in contrast to transformers which activate all neurons. This sparse activation is more biologically plausible and energy-efficient, similar to how the human brain functions.
The BDH model is state-based, maintaining and updating an internal state rather than relying on a transformer-style attention mechanism. This approach is inspired by biological learning principles, such as Hebbian learning, where neurons strengthen connections through repeated activations.
BDH also demonstrates continual learning by adapting and improving through repeated interactions, unlike transformers that rely on fixed weights post-training. It can learn new tasks rapidly, reaching an advanced beginner level in a short time and improving with practice.
BDH achieves its impressive Sudoku results at a materially lower cost than leading LLMs due to its efficient reasoning process that avoids generating long text chains. The cost efficiency and reasoning capability of BDH suggest significant potential for real-world applications beyond Sudoku.
While BDH is still in early development stages, the architecture's performance against transformers highlights its potential to surpass current AI models in reasoning tasks. Pathway's focus on reasoning models could push AI capabilities further, challenging the dominance of transformers.
Key Insights
- Pathway's BDH architecture solved extreme Sudoku puzzles with 97.4% accuracy, while transformer-based models scored almost zero, demonstrating a significant capability gap in constraint satisfaction problems.
- Sudoku's requirement for simultaneous satisfaction of constraints reveals transformers' limitations, as they process information token-by-token with a limited internal state, making them inefficient for non-textual reasoning tasks.
- BDH employs sparse positive activations, activating only a fraction of neurons at once, mimicking the energy efficiency and biological plausibility of the human brain, unlike the dense activation in transformers.
- BDH's state-based model allows for continual learning and internal state updates, inspired by Hebbian learning, enabling rapid adaptation to new tasks and efficient reasoning without generating extensive text.
View all Super Data Science: ML & AI Podcast with Jon Krohn recaps