Humans Expect Rationality From LLM Opponents in Games

A controlled, monetarily-incentivized lab experiment finds human players choose significantly lower numbers when playing against Large Language Models (LLMs) in a multi-player p-beauty contest. The shift manifests as an increased prevalence of the zero Nash-equilibrium choice and is concentrated among participants with high strategic reasoning ability. Participants who pick zero cite perceived LLMs reasoning strength and, unexpectedly, a belief that LLMs will cooperate. The results highlight heterogeneous human beliefs about AI agents and carry direct implications for mechanism design in mixed human-AI systems, multi-agent deployments, and marketplaces where algorithmic opponents interact with humans.
What happened
A monetarily-incentivized laboratory experiment compared individual human play in a multi-player p-beauty contest against other humans and against Large Language Models (LLMs). The authors find humans choose significantly lower numbers when facing LLMs, driven by a higher rate of the zero Nash-equilibrium choice. The effect is strongest among subjects with high strategic reasoning ability, who explicitly cite perceived LLMs reasoning and an unexpected perceived propensity for cooperation.
Technical details
The study uses a within-subject design to compare each participant's choices across human-opponent and LLM-opponent conditions. Key empirical findings include:
- •A statistically significant downward shift in chosen numbers when opponents are LLMs versus humans.
- •A larger incidence of the zero Nash-equilibrium choice in the LLM condition.
- •Heterogeneity: high-strategic-reasoning subjects account for most of the shift and motivate choices by beliefs about LLMs reasoning and cooperation.
Why it matters
This is one of the first controlled, incentivized experiments probing strategic human behavior against modern language models. The results speak directly to deployment settings where algorithmic agents act as opponents, partners, or contract bidders. Expectation shifts and belief heterogeneity can alter equilibrium outcomes, efficiency, and welfare in markets and games that mix humans and AI.
Practical implications for practitioners
- •Mechanism design must account for altered human best responses to algorithmic agents; traditional equilibria may not predict mixed human-AI systems.
- •Evaluation of multi-agent systems should include behavioral lab tests, not only synthetic self-play or ML benchmarks.
- •User modeling and interface signalling matter: perceived reasoning and cooperation drive behavior and can be modulated by agent transparency.
What to watch
Validate these findings across other strategic games, different LLM prompting regimes, and real-world field settings. Designers should experiment with agent transparency and incentives to avoid unintended strategic distortions.
Scoring Rationale
This controlled, incentivized experiment offers actionable insights for mechanism design, lab validation, and multi-agent deployments, but it is a single-domain behavioral study rather than a frontier technical breakthrough.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.