Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Type: paper Slug: mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model—hassabis Sources: mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model—hassabis Last updated: 2026-05-13

Summary

Schrittwieser, Antonoglou, Hubert, Simonyan, Sifre, Schmitt, Guez, Lockhart, Hassabis, Graepel, Lillicrap, and Silver (2020) introduced MuZero, a reinforcement learning system that achieves superhuman performance in board games (Go, chess, shogi) and Atari games without knowing the rules or state transitions of the environment. MuZero learns a dynamics model of the environment from experience and uses it for planning via Monte Carlo tree search — a major advance toward general-purpose model-based RL.

Core content

Core problem addressed: AlphaZero requires the rules of the game (the transition function) to be provided. MuZero removes this requirement by learning the dynamics model itself (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Introduction).

Three learned functions:

Representation function: Maps the observed state (e.g., raw pixels in Atari) to a hidden state (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Methods).
Dynamics function: Predicts the next hidden state and immediate reward given a hidden state and action (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Methods).
Prediction function: Maps the hidden state to policy and value predictions (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Methods).

Planning with learned models: MCTS operates entirely in the learned latent space — the real environment is only queried to obtain the initial observation and to execute the chosen action (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Methods).

Results:

Matched AlphaZero in Go, chess, and shogi despite not knowing the game rules (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Results).
Achieved a new state of the art on 57 Atari games, outperforming both model-free methods (DQN, Rainbow) and prior model-based approaches (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Results).
Demonstrated that learned dynamics models can be accurate enough for effective long-horizon planning (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Results).

Significance: MuZero demonstrated that model-based planning can scale to complex domains, bridging the gap between model-free RL (which works but doesn’t plan) and classical planning (which plans but requires known models).

Connections- Theme: theme—deep-RL, model-based-RL, theme—game-playing-ai, chess

Project: MuZero
Collaborators: Julian Schrittwieser (co-first), Ioannis Antonoglou (co-first), Thomas Hubert (co-first), David Silver (co-first), Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Thore Graepel, Timothy Lillicrap
Era: alphafold-era
Venue: venue—Nature
Extends: paper—a-general-reinforcement-learning-algorithm-that-masters-chess-shogi-and-go — MuZero generalizes AlphaZero by removing the need for known rules
Notable quote: “MuZero achieves state-of-the-art performance on Atari, matching the performance of the AlphaZero algorithm on board games without any access to the environment dynamics.” (paper—mastering-atari-go-chess-and-shogi-by-planning-with-a-learned-model §Abstract)

Honest Gaps

Metadata lists 4 co-authors; the actual paper has 12 authors.
The learned dynamics model can accumulate errors over long planning horizons, though MuZero’s re-analysis strategy mitigates this.
Performance on Atari, while strong, still falls short of the best model-free methods (e.g., Agent57) on some individual games.
The computational cost of MCTS with learned models is substantial — MuZero requires more compute per decision than model-free methods.
No code or models were released.
The PDF extraction has merged text (no spaces between words) in author names and some body text.

MinedDeep

Explorer

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Summary

Core content

Connections- Theme: theme—deep-RL, model-based-RL, theme—game-playing-ai, chess

Honest Gaps

Graph View

Table of Contents

Backlinks