Generalists Outperform Specialists in Game-Theory Settings

MIT LIDS and EECS researchers, led by Sobhan Mohammadpour and Gabriele Farina, published a paper at ICLR 2026 showing that general-purpose policy gradient algorithms can outperform specialized game-theoretic algorithms in two-player zero-sum imperfect-information games such as Phantom Tic-Tac-Toe, Hex variants, and Liar's Dice. The team evaluated performance using an "exploitability" metric -- how well a player fares against a worst-case adversary -- on games with up to 30 billion states, a scale 100,000 times larger than typical benchmarks. A key contribution is an open benchmarking suite built on top of OpenSpiel that researchers can run on a standard laptop.
What happened
MIT LIDS and EECS researchers presented "Reevaluating policy gradient methods for imperfect-information games" at the International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro. The paper, co-authored by Sobhan Mohammadpour and Gabriele Farina (MIT), with collaborators from UT Austin, UC Berkeley, Carnegie Mellon, and NYU, challenges a long-held assumption: that specialized game-theoretic algorithms are the clear best approach for training agents in two-player zero-sum imperfect-information games. The study finds that general-purpose policy gradient methods -- a class of reinforcement learning algorithms originating in the early 1990s -- can match or beat these specialized algorithms when evaluated rigorously.
Key findings
The team ran experiments across five imperfect-information games: two versions of Phantom Tic-Tac-Toe (where players cannot see their opponent's moves), two Hex variants, and Liar's Dice. Performance was measured using "exploitability" -- a metric borrowed from game theory that scores how far a policy deviates from Nash equilibrium play when facing a worst-case opponent. Neural networks trained with policy gradient methods achieved better (lower) exploitability scores than those trained with game-theoretic algorithms, and won head-to-head matches as well.
Benchmarking contribution
A major output is a freely available benchmarking suite integrated with OpenSpiel that runs on an ordinary laptop. Senior author Farina noted that the field had not done the engineering work needed to rigorously compare algorithms at scale, making it hard to see what worked. The benchmark supports games with up to 30 billion states -- roughly 100,000 times larger than what exploitability had previously been computed on.
Broader implications
Per Farina, the term "game" applies to any multi-agent strategic interaction, including military operations, trading, and negotiation -- all settings with hidden information. Eugene Vinitsky (NYU), a co-author, reinforced this: "The idea that we can improve on these games suggests that we can also do better in these other settings as well." Ian Gemp, a game theory researcher at Google DeepMind who was not involved in the work, described the results as "a compelling reminder that modernizing classical tools remains a highly productive path for solving complex strategic problems."
Scoring Rationale
ICLR 2026 paper from MIT LIDS that overturns a field assumption about game-theoretic algorithms versus policy gradient methods in multi-agent games; releases a public benchmark on OpenSpiel. Notable for multi-agent RL practitioners; important but not paradigm-shifting at the industry level.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


