Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning

Type: paper Slug: grandmaster-level-in-starcraft-ii-using-multi-agent-rl—hassabis Sources: grandmaster-level-in-starcraft-ii-using-multi-agent-rl—hassabis Last updated: 2026-05-13

Summary

Vinyals, Babuschkin, Czarnecki, Mathieu, and 36 colleagues, with Apps, Lillicrap, Kavukcuoglu, Hassabis, and Silver (2019) developed AlphaStar, an AI system that achieved grandmaster level in StarCraft II, a real-time strategy game considered one of the most challenging AI benchmarks. AlphaStar used a multi-agent league training system where diverse agents played against each other, combined with a deep neural network architecture that processed the game’s raw feature maps.

Core content

Why StarCraft II is hard: Imperfect information (fog of war), enormous action space (~10^26 possible actions per step), real-time decision-making (not turn-based), and the need for long-term strategic planning alongside tactical micro-management (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Introduction).

Architecture: A deep neural network with a transformer torso processing game features (unit types, positions, visibility), an LSTM core for temporal reasoning, and separate heads for action selection and autoregressive action argument generation (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Methods).

Multi-agent league training:

Main agents: Trained via self-play with supervised learning initialization from human replays (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Methods).
League exploitation: “League” of diverse agents with different objectives (main agents, league exploiters, main exploiters) that prevent strategic cycles and ensure coverage of diverse strategies (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Methods).
This approach addressed the rock-paper-scissors dynamics that plagued prior StarCraft AI efforts (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Discussion).

Results:

Defeated 99.8% of active human players on official Battle.net ladders (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Results).
Achieved grandmaster rank (top 0.2% of human players) on all three StarCraft II races (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Results).
Live matches against professional players showed strong but not superhuman performance (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Results).

Connections- Theme: theme—game-playing-ai, starcraft

Project: AlphaStar
Collaborators: Oriol Vinyals (co-first), Igor Babuschkin (co-first), Wojciech M. Czarnecki (co-first), Michaël Mathieu (co-first), Chris Apps (co-first), David Silver (co-first), Max Jaderberg, Aja Huang, Timothy Lillicrap, Koray Kavukcuoglu, and ~30 additional authors
Era: alphafold-era
Venue: venue—Nature
Related: paper—a-general-theme—deep-RL-algorithm-that-masters-chess-shogi-and-go — both use theme—self-play, but AlphaStar adds league training for strategic diversity

Honest Gaps

Metadata lists only 3 co-authors; the actual paper has ~44 authors.
APM (actions per minute) was uncapped in initial versions, leading to superhuman micro that human critics argued was unrealistic. This was later addressed with an APM cap.
AlphaStar was trained on a single map and matchup at a time — generalization across maps was not demonstrated.
The live matches against professional players (TLO, MaNa) showed some weaknesses, particularly in unusual strategies.
The league training system is computationally extremely expensive.
No code or model was released.

MinedDeep

Explorer

Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning

Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning

Summary

Core content

Connections- Theme: theme—game-playing-ai, starcraft

Honest Gaps

Graph View

Table of Contents

Backlinks