Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning
Type: paper Slug: grandmaster-level-in-starcraft-ii-using-multi-agent-rl—hassabis Sources: grandmaster-level-in-starcraft-ii-using-multi-agent-rl—hassabis Last updated: 2026-05-13
Summary
Vinyals, Babuschkin, Czarnecki, Mathieu, and 36 colleagues, with Apps, Lillicrap, Kavukcuoglu, Hassabis, and Silver (2019) developed AlphaStar, an AI system that achieved grandmaster level in StarCraft II, a real-time strategy game considered one of the most challenging AI benchmarks. AlphaStar used a multi-agent league training system where diverse agents played against each other, combined with a deep neural network architecture that processed the game’s raw feature maps.
Core content
Why StarCraft II is hard: Imperfect information (fog of war), enormous action space (~10^26 possible actions per step), real-time decision-making (not turn-based), and the need for long-term strategic planning alongside tactical micro-management (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Introduction).
Architecture: A deep neural network with a transformer torso processing game features (unit types, positions, visibility), an LSTM core for temporal reasoning, and separate heads for action selection and autoregressive action argument generation (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Methods).
Multi-agent league training:
- Main agents: Trained via self-play with supervised learning initialization from human replays (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Methods).
- League exploitation: “League” of diverse agents with different objectives (main agents, league exploiters, main exploiters) that prevent strategic cycles and ensure coverage of diverse strategies (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Methods).
- This approach addressed the rock-paper-scissors dynamics that plagued prior StarCraft AI efforts (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Discussion).
Results:
- Defeated 99.8% of active human players on official Battle.net ladders (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Results).
- Achieved grandmaster rank (top 0.2% of human players) on all three StarCraft II races (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Results).
- Live matches against professional players showed strong but not superhuman performance (paper—grandmaster-level-in-starcraft-ii-using-multi-agent-rl §Results).
Connections- Theme: theme—game-playing-ai, starcraft
- Project: AlphaStar
- Collaborators: Oriol Vinyals (co-first), Igor Babuschkin (co-first), Wojciech M. Czarnecki (co-first), Michaël Mathieu (co-first), Chris Apps (co-first), David Silver (co-first), Max Jaderberg, Aja Huang, Timothy Lillicrap, Koray Kavukcuoglu, and ~30 additional authors
- Era: alphafold-era
- Venue: venue—Nature
- Related: paper—a-general-theme—deep-RL-algorithm-that-masters-chess-shogi-and-go — both use theme—self-play, but AlphaStar adds league training for strategic diversity
Honest Gaps
- Metadata lists only 3 co-authors; the actual paper has ~44 authors.
- APM (actions per minute) was uncapped in initial versions, leading to superhuman micro that human critics argued was unrealistic. This was later addressed with an APM cap.
- AlphaStar was trained on a single map and matchup at a time — generalization across maps was not demonstrated.
- The live matches against professional players (TLO, MaNa) showed some weaknesses, particularly in unusual strategies.
- The league training system is computationally extremely expensive.
- No code or model was released.