Random in Python with Expected Goals

Creating random numbers is a central part of programming – whether it is for simulations, games or models, there are a multitude of uses for random numbers. Python’s Random module makes creating pseudo-random numbers incredibly simple. This article runs through the basic commands to create a random number, then looks to apply this to finding how likely you are to win based on a match’s expected goals (we’ll come on to what these are later).

Let’s import the random module and get started.

In [1]:
import random

The easiest way to get a random number is through the ‘.random()’ operation. This will give a random number between 0 and 1. Check it out:

In [2]:
random.random()
Out[2]:
0.10769078951918487

A number between 0 and 1 is very useful. Essentially, it gives us a percentage that we can use to calculate chance, or to assign to a variable for calculation.

Additionally, we can use it to create a random whole number by multiplying it by the maximum possible value, then rounding with ‘int()’. The example here gives us a random number between 0 and 100:

In [3]:
int(random.random()*100)
Out[3]:
88

Alternatively to the above, we could use another feature of the Random module to create a random whole number for us – ‘.randint()’. We simply pass the highest and lowest possible values (inclusive) that we will allow. Let’s simulate a dice roll:

In [4]:
random.randint(1,6)
Out[4]:
2

Applying Random to Expected Goals

Great job on getting to know Random. The rest of the article applies it to expected goals, and will allow us to calculate how ‘lucky’ a team was based on the quality of their shots.

Firstly, expected goals is a measurement of how many goals a team could have expected to score based on the shots that they took. Different models base this on different things, but most commonly, the location of the shot, type of build-up, foot used are used to compare the chance with similar ones historically. We can then see the percentage chance of the shot becoming a goal.

Knowing the expected goals, we can then use ‘.random()’ to test how likely that score was. Let’s set up our lists of shots with their expected goal values – these are all percentages represented as decimals.

In [5]:
HomexG = [0.21,0.66,0.1,0.14,0.01]
AwayxG = [0.04,0.06,0.01,0.04,0.06,0.12,0.01,0.06]

The first shot for the home team had a 21% chance of being scored. Let’s create a random percentage to simulate if it goes in or not. If it is equal or less than 21%, we can say that it is scored in our simulation:

In [6]:
if random.random()<=0.21:
    print("GOAL!")
else:
    print("Missed!")
Missed!

As happens roughly 4 out of 5 times, this time the shot was missed. Let’s run the shot 10,000 times:

In [7]:
Goals = 0

for i in range(0,10000):
    if random.random()<=0.21:
        Goals += 1

print(Goals)
2071

So according to the xG score and our random test, if we take that shot 10,000 times, we can expect 2075 goals (pretty much in line with the 0.21 score). In a nutshell, this is how we simulate with random numbers.

Going Further: Simulating a Match with Expected Goals

Rather than simulate with one number, let’s apply this same test to every shot by the home and away teams. Take a look through the function below and try to understand how it applies the test above to every shot in the HomexG and AwayxG lists.

In [8]:
def calculateWinner(home, away):
    #Our match starts at 0-0
    HomeGoals = 0
    AwayGoals = 0
    
    #We have a function within our function
    #This one runs the '.random()' test above for a list
    def testShots(shots):
        
        #Start goal count at 0
        Goals = 0
        
        #For each shot, if it goes in, add a goal
        for shot in shots:
            if random.random() <= shot:
                Goals += 1
                
        #Finally, return the number of goals
        return Goals
    
    #Run the above formula for home and away lists
    HomeGoals = testShots(home)
    AwayGoals = testShots(away)
    
    #Return the score
    if HomeGoals > AwayGoals:
        print("Home Wins! {} - {}".format(HomeGoals, AwayGoals))
    elif AwayGoals > HomeGoals:
        print("Away Wins! {} - {}".format(HomeGoals, AwayGoals))
    else:
        print("Share of the points! {} - {}".format(HomeGoals, AwayGoals))
In [9]:
calculateWinner(HomexG, AwayxG)
Home Wins! 1 - 0

We are now simulating a whole game based on expected goals, pretty cool!

However, we are only simulating once. In order to get a proper estimate as to how likely it is that one team wins, we need to do this lots of times.

Let’s change our last function to simply return the result, not a user-friendly print out. We can then use this function over and over to calculate an accurate percentage chance of winning for each team.

In [10]:
def calculateWinner(home, away):
    HomeGoals = 0
    AwayGoals = 0
    
    def testShots(shots):
        Goals = 0
        
        for shot in shots:
            if random.random() <= shot:
                Goals += 1
        return Goals
    
    HomeGoals = testShots(home)
    AwayGoals = testShots(away)
    
    #This is all that changes from above
    #We now pass a simple string, rather than ask for a print out.
    if HomeGoals > AwayGoals:
        return("home")
    elif AwayGoals > HomeGoals:
        return("away")
    else:
        return("draw")

Now, let’s run this function 10000 times, and work out the percentage of each result:

In [11]:
#Run xG calculator 10000 times to test winner %
def calculateChance(team1, team2):
    home = 0;
    away = 0;
    draw = 0;
    
    for i in range(0,10000):
        matchWinner = calculateWinner(team1,team2)
        if matchWinner == "home":
            home +=1
        elif matchWinner == "away":
            away +=1
        else:
            draw +=1
    
    home = home/100
    away = away/100
    draw = draw/100
    
    print("Over 10000 games, home wins {}%, away wins {}% and there is a draw in {}% of games.".format(home, away, draw))
In [12]:
calculateChance(HomexG, AwayxG)
Over 10000 games, home wins 60.7%, away wins 10.16% and there is a draw in 29.14% of games.

There we go! We now have a better understanding as to what result we could normally expect from these chances!

Let’s try a new run of expected goals – one great chance (50%) against 10 poor chances (5%). Who wins most often here?

In [13]:
HomexG=[0.5]
AwayxG=[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
calculateChance(HomexG, AwayxG)
Over 10000 games, home wins 30.84%, away wins 23.14% and there is a draw in 46.02% of games.

Interestingly, the big chance team has a 5% advantage over the team that shoots loads from low-chance opportunities. Makes you think!

Summary

Creating random numbers is easy, whether we want a random percentage or number between 0 and 1 (.random()) or we want a random whole integer (randint()), the random module is a big help.

In this article, we saw how we can apply random numbers to a simulation. If anything around the function creation or for loops was confusing here, you might want to take a read up on those. Alternatively, why not push forward with more complex data sets?