Reinforcement Learning, Fast and Slow

Type: paper Slug: reinforcement-learning-fast-and-slow—hassabis Sources: reinforcement-learning-fast-and-slow—hassabis Last updated: 2026-05-13

Summary

Botvinick, Ritter, Wang, Kurth-Nelson, Blundell, and Hassabis (2020) published a review in Neuron proposing that the brain’s RL system comprises parallel “fast” and “slow” learning processes, analogous to Kahneman’s System 1 and System 2. The fast system (dorsolateral striatum, model-free) learns habits from cached values; the slow system (PFC-hippocampal, model-based) uses cognitive maps for deliberative planning. They argue that deep RL’s successes and limitations mirror this dichotomy, and that bridging the two systems is key to more capable AI.

Core content

Core analogy: Just as Kahneman distinguished fast intuitive thinking (System 1) from slow deliberative thinking (System 2), the brain’s RL system operates at two timescales (paper—reinforcement-learning-fast-and-slow §Introduction).

Fast RL (System 1):

Implemented in dorsolateral striatum
Model-free: cached state-action values updated by reward prediction errors
Fast learning, computationally cheap, but inflexible
Analogous to DQN and policy gradient methods in deep RL (paper—reinforcement-learning-fast-and-slow §Fast)

Slow RL (System 2):

Implemented in PFC-hippocampal circuit
Model-based: uses a learned world model to plan via search or simulation
Slow learning, computationally expensive, but flexible and generalizable
Analogous to model-based RL, MBPO, and imagination-based methods (paper—reinforcement-learning-fast-and-slow §Slow)

Key arguments:

Current deep RL is overwhelmingly “fast” — most breakthroughs (DQN, AlphaGo, AlphaStar) rely on model-free methods with enormous data (paper—reinforcement-learning-fast-and-slow §Gap).
The gap between human and AI learning efficiency may be explained by humans’ ability to leverage “slow” model-based learning (paper—reinforcement-learning-fast-and-slow §Gap).
Promising directions: model-based RL, episodic control, meta-learning, and memory-augmented networks as implementations of “slow” RL (paper—reinforcement-learning-fast-and-slow §Bridging).

Connections- Theme: theme—deep-RL, theme—neuroscience-AI-bridge, complementary-learning-systems

Project: N/A (review article)
Collaborators: Matthew Botvinick (first author), Sam Ritter, Jane X. Wang, Zeb Kurth-Nelson, Charles Blundell
Era: alphafold-era
Venue: venue—Neuron
Format: review-article (maps to paper— prefix)
Synthesizes: paper—prefrontal-cortex-as-a-meta-theme—deep-RL-system, paper—a-distributional-code-for-value-in-dopamine-based-theme—deep-RL, paper—human-level-control-through-deep-theme—deep-RL
Related: paper—what-learning-systems-do-intelligent-agents-need-complementary-learning-systems-theory-updated — both propose multi-system learning architectures

Honest Gaps

Metadata lists Kurth-Nelson, Blundell, Botvinick, Dolan as co-authors; actual authors are Botvinick, Ritter, Wang, Kurth-Nelson, Blundell, Hassabis. Raymond Dolan is not an author; Sam Ritter, Jane X. Wang, and Hassabis are missing.
As a review article, it makes no original empirical claims.
The fast/slow dichotomy, while heuristically useful, oversimplifies the diversity of learning mechanisms in the brain.
The analogy between Kahneman’s Systems 1/2 and model-free/model-based RL is imperfect — the mapping is not one-to-one.
Published in 2020, it predates major advances in model-based RL (e.g., Dreamer, MuZero) that partially address the “slow RL” gap.

MinedDeep

Explorer

Reinforcement Learning, Fast and Slow

Reinforcement Learning, Fast and Slow

Summary

Core content

Connections- Theme: theme—deep-RL, theme—neuroscience-AI-bridge, complementary-learning-systems

Honest Gaps

Graph View

Table of Contents

Backlinks