Researchers simulate 100,000 World Cup tournaments

Writing in The Conversation, statistician Achim Zeileis and an international research team describe running 100,000 simulations of the 2026 FIFA World Cup to produce probabilistic forecasts for every match and for the tournament. Their hybrid model blends four strength signals: team abilities from historic results (a bivariate Poisson model with exponential time-weighting), a bookmaker-consensus rating from 24 bookmakers, plus-minus player ratings, and Transfermarkt market values. A random forest, trained on World Cups and Euros from 2006 to 2024, learns how to combine these with covariates such as FIFA rank, Elo rating, and GDP, then predicts goals for every possible match; bivariate Poisson draws convert those into win, draw, and loss probabilities that are simulated 100,000 times. The headline result makes Spain the favourite at 14.5%, ahead of England and France (12.4% each) and Germany (11.2%), with the expanded 48-team field leaving unusually open odds.
What happened
In The Conversation, statistician Achim Zeileis (University of Innsbruck) and collaborators report probabilistic forecasts for the 2026 FIFA World Cup, the first 48-team edition, running 11 June to 19 July across Canada, Mexico, and the United States. By simulating the whole tournament 100,000 times, the team estimates each side's chances. Per their write-up and Zeileis's accompanying technical post, Spain is the favourite to win at 14.5%, closely followed by England and France at 12.4% each and Germany at 11.2%.
How the model works
The forecast is a two-step hybrid. First, four separate signals estimate team and player strength: a bivariate Poisson model fit to historic national-team matches (with exponential weighting toward recent results); a bookmaker-consensus rating distilled from the odds of 24 bookmakers; plus-minus player ratings aggregated to team level; and average squad market values from Transfermarkt's wisdom-of-the-crowd valuations. Second, a random forest trained on Men's World Cups and Euros from 2006 to 2024 learns how to weight these signals alongside covariates such as FIFA rank, Elo rating, number of Champions League players, and GDP per capita. The model outputs expected goals for every possible matchup, which feed independent Poisson distributions to yield win, draw, and loss probabilities, then full-tournament simulation.
Why it matters
Analytical note: the pipeline is a clean illustration of techniques that generalise well beyond football, namely combining domain-expert market signals with statistical estimates, ensembling heterogeneous predictors with a random forest, and using large Monte Carlo runs to turn per-event probabilities into calibrated distributions over complex outcomes. The authors note the expanded 48-team format and a more variable draw make this edition unusually open, with several contenders but no dominant favourite.
What to watch
The team highlights that their model ranks Germany 4th versus roughly 7th for bookmakers, while rating Brazil and Argentina below the bookmaker consensus, divergences that offer a natural out-of-sample test as the tournament unfolds. For forecasters, the open questions are calibration and live updating: how the predictions hold up against results, and how injuries, lineups, and in-tournament information could be folded into updated odds.
Scoring Rationale
A well-documented applied-ML forecast (hybrid random forest over bivariate-Poisson, bookmaker-consensus, player-rating, and market-value signals) that is genuinely instructive for practitioners. It is a recurring annual explainer rather than a novel method or breakthrough, so it sits solidly mid-range.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


