Can I predict the World Cup results within an hour? My friends started a small competition to see who could best predict the results of the FIFA world cup. The rules are simple: predict the scores for each match in each round. We began with the group stage, which consists of 72 matches. My football knowledge isn’t nonexistent, but it is fairly limited, so I figured: why not try to predict it using a small algorithm? Time was also tight, since I only had one hour left before the deadline.
The challenge
The main challenge was writing a piece of code to predict the results of 72 matches within an hour. I do not have much experience in sports prediction, so I first needed to do some quick research to figure out how to even approach the problem.
After some initial exploration, I realised that collecting all the detailed data I wanted would not be feasible within the limited timeframe, so I decided to keep it simple and use FIFA ratings as a baseline.
A lot of resources pointed towards using Monte Carlo simulation and the Poisson distribution as a solid foundation for basic match predictions. Similar approaches are used in sports analytics and betting models, although real implementations include separate attack/defense strength, home advantage, and calibration on historical match data.
Challenge accepted: I decided to use FIFA ratings combined with a Monte Carlo simulation based on the Poisson distribution to predict the results.
Action plan
I wanted to write the code in Python because I know it has a lot of good libraries for data processing. I was working on a Windows computer, so I decided to keep things simple and use Docker and Jupyter Notebook to quickly run my Python code.
As explained above, I wanted to use a Monte Carlo simulation, but what is it exactly? The concept is not that hard to understand. You have an outcome based on a probability model (Poisson distribution in this case) and simulate that outcome x number of times. In this very simple case, Monte Carlo simulation repeatedly samples possible match outcomes from the Poisson distributions, allowing us to estimate win/draw probabilities and the most likely scoreline.
As explained above, we need a probability model. In this case, we used the Poisson distribution. If a team scores, on average, 0.7 goals, we can use this distribution to estimate the probability of scoring 0, 1, 2, 3, … goals. In an ideal world, this average would be calculated for each match based on previous matches.
But again, due to time limitations (and lack of data), I wrote a simplified calculation method based on the rating difference between the teams. To amplify the difference between stronger and weaker teams, I used an exponential function on the rating difference, in order to create larger score gaps when the rating difference is bigger.
Show me the code
Now enough jibber-jabber, let’s talk about code.
For ease of use, I decided to go with Python. I know Python has some easy-to-use libraries for these kinds of calculations. I used NumPy for the Poisson distribution and for the mean calculations—nothing too fancy there.
To display the data in a somewhat clean way, I used Pandas to present the results in a grid format and to export them as a CSV file. This made it easy to share the predictions with my friends for the competition.
Instead of sharing a bunch of screenshots, you can just check out the code on my GitHub: worldcup-predictor
The results
f you want the exact results, just run the code. The data is already hard-coded, so simply pressing run should reproduce everything.
Within the limited timeframe, I was able to generate some realistic results, and I am happy with that. One thing to note is that many predicted scores have a low total number of goals. This is due to the very simple approach used to calculate the expected goals in each match. With more time, this would be the first area for improvement.
I expect the model to perform reasonably well in predicting the match winners, but less well when it comes to exact scorelines. Only time will tell—so for now, let’s enjoy the upcoming tournament.