Early DeepMind (2010–2015)

Type: period Slug: period—early-deepmind Sources: human-level-control-through-deep-reinforcement-learning—hassabis Last updated: 2026-05-13

Summary

The period from DeepMind’s founding (2010) through its first landmark publication (2015) is represented in the corpus by a single paper — the DQN paper in Nature — but this paper’s impact vastly exceeds its solitary status. Human-level control through deep reinforcement learning (paper—human-level-control-through-deep-reinforcement-learning) demonstrated that a single neural network architecture, trained end-to-end with RL, could achieve human-level performance across 29 Atari games. This result validated DeepMind’s foundational bet that deep learning and reinforcement learning could be unified into a general-purpose learning system.

Core content

The DQN result (2015): A convolutional neural network trained with Q-learning and experience replay learned to play 29 Atari 2600 games from raw pixel input, achieving human-level or superhuman performance on 23 of them (paper—human-level-control-through-deep-reinforcement-learning). Two key technical innovations made this possible: experience replay (storing and sampling past transitions to break temporal correlations) and a target network (a slowly-updated copy of the Q-network to stabilise training).

Intellectual context: While the corpus has no publications from 2010–2014, the DQN paper’s approach reflects ideas traceable to the PhD period — the construction system’s emphasis on recombining stored elements (experience replay as a form of constructive memory) and the neuroscience literature on model-free reinforcement learning that Hassabis would have encountered in the Maguire/Dolan labs.

Impact profile: This paper is both field-defining and top-cited. It established DeepMind as a serious research lab, contributed to Google’s acquisition (2014), and set the template for the “deep RL” paradigm that dominated the next five years of the lab’s output.

Connections

Theme: theme—deep-RL, theme—reinforcement-learning
Project: project—DQN
Collaborators: Volodymyr Mnih (first author), Koray Kavukcuoglu, David Silver
Venue: venue—Nature
Succeeds: period—postdoc-period (5-year publication gap)
Precedes: period—deepmind-ascent — the DQN approach is extended and generalised across the next era’s papers
Precedes: paper—overcoming-catastrophic-forgetting — directly addresses DQN’s stability limitations

Honest Gaps

The 2010–2014 publication gap is the largest in the corpus. No papers, essays, or public statements from DeepMind’s founding years are present.
No sources document the intellectual transition from hippocampal construction to deep RL — how (or whether) the neuroscience ideas explicitly informed DQN’s design.
The DQN paper has ~6 authors in metadata but the actual Nature paper lists more — co-author undercount likely.
No blog posts, interviews, or technical reports from this period are in the corpus, despite DeepMind being publicly active.
This is the only period where the corpus is essentially a single data point — any synthesis is necessarily speculative.

MinedDeep

Explorer

Early DeepMind (2010–2015)

Early DeepMind (2010–2015)

Summary

Core content

Connections

Honest Gaps

Graph View

Table of Contents

Backlinks