Early DeepMind (2010–2015)
Type: period Slug: period—early-deepmind Sources: human-level-control-through-deep-reinforcement-learning—hassabis Last updated: 2026-05-13
Summary
The period from DeepMind’s founding (2010) through its first landmark publication (2015) is represented in the corpus by a single paper — the DQN paper in Nature — but this paper’s impact vastly exceeds its solitary status. Human-level control through deep reinforcement learning (paper—human-level-control-through-deep-reinforcement-learning) demonstrated that a single neural network architecture, trained end-to-end with RL, could achieve human-level performance across 29 Atari games. This result validated DeepMind’s foundational bet that deep learning and reinforcement learning could be unified into a general-purpose learning system.
Core content
The DQN result (2015): A convolutional neural network trained with Q-learning and experience replay learned to play 29 Atari 2600 games from raw pixel input, achieving human-level or superhuman performance on 23 of them (paper—human-level-control-through-deep-reinforcement-learning). Two key technical innovations made this possible: experience replay (storing and sampling past transitions to break temporal correlations) and a target network (a slowly-updated copy of the Q-network to stabilise training).
Intellectual context: While the corpus has no publications from 2010–2014, the DQN paper’s approach reflects ideas traceable to the PhD period — the construction system’s emphasis on recombining stored elements (experience replay as a form of constructive memory) and the neuroscience literature on model-free reinforcement learning that Hassabis would have encountered in the Maguire/Dolan labs.
Impact profile: This paper is both field-defining and top-cited. It established DeepMind as a serious research lab, contributed to Google’s acquisition (2014), and set the template for the “deep RL” paradigm that dominated the next five years of the lab’s output.
Connections
- Theme: theme—deep-RL, theme—reinforcement-learning
- Project: project—DQN
- Collaborators: Volodymyr Mnih (first author), Koray Kavukcuoglu, David Silver
- Venue: venue—Nature
- Succeeds: period—postdoc-period (5-year publication gap)
- Precedes: period—deepmind-ascent — the DQN approach is extended and generalised across the next era’s papers
- Precedes: paper—overcoming-catastrophic-forgetting — directly addresses DQN’s stability limitations
Honest Gaps
- The 2010–2014 publication gap is the largest in the corpus. No papers, essays, or public statements from DeepMind’s founding years are present.
- No sources document the intellectual transition from hippocampal construction to deep RL — how (or whether) the neuroscience ideas explicitly informed DQN’s design.
- The DQN paper has ~6 authors in metadata but the actual Nature paper lists more — co-author undercount likely.
- No blog posts, interviews, or technical reports from this period are in the corpus, despite DeepMind being publicly active.
- This is the only period where the corpus is essentially a single data point — any synthesis is necessarily speculative.