Models & Researchgame theoryalgorithmsmulti agentresearch

Generalists Outperform Specialists in Game-Theory Settings

|June 17, 2026|By LDS Team

6.3

Relevance Score

Generalists Outperform Specialists in Game-Theory Settings — Photo: news.mit.edu · rights & takedowns

MIT LIDS and EECS researchers, led by Sobhan Mohammadpour and Gabriele Farina, published a paper at ICLR 2026 showing that general-purpose policy gradient algorithms can outperform specialized game-theoretic algorithms in two-player zero-sum imperfect-information games such as Phantom Tic-Tac-Toe, Hex variants, and Liar's Dice. The team evaluated performance using an "exploitability" metric -- how well a player fares against a worst-case adversary -- on games with up to 30 billion states, a scale 100,000 times larger than typical benchmarks. A key contribution is an open benchmarking suite built on top of OpenSpiel that researchers can run on a standard laptop.

What happened

MIT LIDS and EECS researchers presented "Reevaluating policy gradient methods for imperfect-information games" at the International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro. The paper, co-authored by Sobhan Mohammadpour and Gabriele Farina (MIT), with collaborators from UT Austin, UC Berkeley, Carnegie Mellon, and NYU, challenges a long-held assumption: that specialized game-theoretic algorithms are the clear best approach for training agents in two-player zero-sum imperfect-information games. The study finds that general-purpose policy gradient methods -- a class of reinforcement learning algorithms originating in the early 1990s -- can match or beat these specialized algorithms when evaluated rigorously.

Key findings

The team ran experiments across five imperfect-information games: two versions of Phantom Tic-Tac-Toe (where players cannot see their opponent's moves), two Hex variants, and Liar's Dice. Performance was measured using "exploitability" -- a metric borrowed from game theory that scores how far a policy deviates from Nash equilibrium play when facing a worst-case opponent. Neural networks trained with policy gradient methods achieved better (lower) exploitability scores than those trained with game-theoretic algorithms, and won head-to-head matches as well.

Benchmarking contribution

A major output is a freely available benchmarking suite integrated with OpenSpiel that runs on an ordinary laptop. Senior author Farina noted that the field had not done the engineering work needed to rigorously compare algorithms at scale, making it hard to see what worked. The benchmark supports games with up to 30 billion states -- roughly 100,000 times larger than what exploitability had previously been computed on.

Broader implications

Per Farina, the term "game" applies to any multi-agent strategic interaction, including military operations, trading, and negotiation -- all settings with hidden information. Eugene Vinitsky (NYU), a co-author, reinforced this: "The idea that we can improve on these games suggests that we can also do better in these other settings as well." Ian Gemp, a game theory researcher at Google DeepMind who was not involved in the work, described the results as "a compelling reminder that modernizing classical tools remains a highly productive path for solving complex strategic problems."

Key Points

1MIT LIDS/EECS researchers show at ICLR 2026 that general-purpose policy gradient algorithms outperform specialized game-theoretic algorithms in imperfect-information zero-sum games.
2Performance was measured by exploitability -- how far a strategy deviates from Nash equilibrium -- on games up to 30 billion states, far exceeding prior benchmarks.
3The team released an open benchmarking suite via OpenSpiel, with implications beyond games to trading, negotiation, and any multi-agent hidden-information setting.

Scoring Rationale

ICLR 2026 paper from MIT LIDS that overturns a field assumption about game-theoretic algorithms versus policy gradient methods in multi-agent games; releases a public benchmark on OpenSpiel. Notable for multi-agent RL practitioners; important but not paradigm-shifting at the industry level.

MoreAI Research news

Sources

Public references used for this report.

2 sources

news.mit.eduIn game theory, generalists sometimes win out over specialists

openreview.netReevaluating policy gradient methods for imperfect-information games (ICLR 2026 paper)

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Models & Researchgame theoryalgorithmsmulti agentresearch