Modules

Random in Python with Expected Goals

Creating random numbers is a central part of programming – whether it is for simulations, games or models, there are a multitude of uses for random numbers. Python’s Random module makes creating pseudo-random numbers incredibly simple. This article runs through the basic commands to create a random number, then looks to apply this to finding how likely you are to win based on a match’s expected goals (we’ll come on to what these are later).

Let’s import the random module and get started.

In [1]:
import random

The easiest way to get a random number is through the ‘.random()’ operation. This will give a random number between 0 and 1. Check it out:

In [2]:
random.random()
Out[2]:
0.10769078951918487

A number between 0 and 1 is very useful. Essentially, it gives us a percentage that we can use to calculate chance, or to assign to a variable for calculation.

Additionally, we can use it to create a random whole number by multiplying it by the maximum possible value, then rounding with ‘int()’. The example here gives us a random number between 0 and 100:

In [3]:
int(random.random()*100)
Out[3]:
88

Alternatively to the above, we could use another feature of the Random module to create a random whole number for us – ‘.randint()’. We simply pass the highest and lowest possible values (inclusive) that we will allow. Let’s simulate a dice roll:

In [4]:
random.randint(1,6)
Out[4]:
2

Applying Random to Expected Goals

Great job on getting to know Random. The rest of the article applies it to expected goals, and will allow us to calculate how ‘lucky’ a team was based on the quality of their shots.

Firstly, expected goals is a measurement of how many goals a team could have expected to score based on the shots that they took. Different models base this on different things, but most commonly, the location of the shot, type of build-up, foot used are used to compare the chance with similar ones historically. We can then see the percentage chance of the shot becoming a goal.

Knowing the expected goals, we can then use ‘.random()’ to test how likely that score was. Let’s set up our lists of shots with their expected goal values – these are all percentages represented as decimals.

In [5]:
HomexG = [0.21,0.66,0.1,0.14,0.01]
AwayxG = [0.04,0.06,0.01,0.04,0.06,0.12,0.01,0.06]

The first shot for the home team had a 21% chance of being scored. Let’s create a random percentage to simulate if it goes in or not. If it is equal or less than 21%, we can say that it is scored in our simulation:

In [6]:
if random.random()<=0.21:
    print("GOAL!")
else:
    print("Missed!")
Missed!

As happens roughly 4 out of 5 times, this time the shot was missed. Let’s run the shot 10,000 times:

In [7]:
Goals = 0

for i in range(0,10000):
    if random.random()<=0.21:
        Goals += 1

print(Goals)
2071

So according to the xG score and our random test, if we take that shot 10,000 times, we can expect 2075 goals (pretty much in line with the 0.21 score). In a nutshell, this is how we simulate with random numbers.

Going Further: Simulating a Match with Expected Goals

Rather than simulate with one number, let’s apply this same test to every shot by the home and away teams. Take a look through the function below and try to understand how it applies the test above to every shot in the HomexG and AwayxG lists.

In [8]:
def calculateWinner(home, away):
    #Our match starts at 0-0
    HomeGoals = 0
    AwayGoals = 0
    
    #We have a function within our function
    #This one runs the '.random()' test above for a list
    def testShots(shots):
        
        #Start goal count at 0
        Goals = 0
        
        #For each shot, if it goes in, add a goal
        for shot in shots:
            if random.random() <= shot:
                Goals += 1
                
        #Finally, return the number of goals
        return Goals
    
    #Run the above formula for home and away lists
    HomeGoals = testShots(home)
    AwayGoals = testShots(away)
    
    #Return the score
    if HomeGoals > AwayGoals:
        print("Home Wins! {} - {}".format(HomeGoals, AwayGoals))
    elif AwayGoals > HomeGoals:
        print("Away Wins! {} - {}".format(HomeGoals, AwayGoals))
    else:
        print("Share of the points! {} - {}".format(HomeGoals, AwayGoals))
In [9]:
calculateWinner(HomexG, AwayxG)
Home Wins! 1 - 0

We are now simulating a whole game based on expected goals, pretty cool!

However, we are only simulating once. In order to get a proper estimate as to how likely it is that one team wins, we need to do this lots of times.

Let’s change our last function to simply return the result, not a user-friendly print out. We can then use this function over and over to calculate an accurate percentage chance of winning for each team.

In [10]:
def calculateWinner(home, away):
    HomeGoals = 0
    AwayGoals = 0
    
    def testShots(shots):
        Goals = 0
        
        for shot in shots:
            if random.random() <= shot:
                Goals += 1
        return Goals
    
    HomeGoals = testShots(home)
    AwayGoals = testShots(away)
    
    #This is all that changes from above
    #We now pass a simple string, rather than ask for a print out.
    if HomeGoals > AwayGoals:
        return("home")
    elif AwayGoals > HomeGoals:
        return("away")
    else:
        return("draw")

Now, let’s run this function 10000 times, and work out the percentage of each result:

In [11]:
#Run xG calculator 10000 times to test winner %
def calculateChance(team1, team2):
    home = 0;
    away = 0;
    draw = 0;
    
    for i in range(0,10000):
        matchWinner = calculateWinner(team1,team2)
        if matchWinner == "home":
            home +=1
        elif matchWinner == "away":
            away +=1
        else:
            draw +=1
    
    home = home/100
    away = away/100
    draw = draw/100
    
    print("Over 10000 games, home wins {}%, away wins {}% and there is a draw in {}% of games.".format(home, away, draw))
In [12]:
calculateChance(HomexG, AwayxG)
Over 10000 games, home wins 60.7%, away wins 10.16% and there is a draw in 29.14% of games.

There we go! We now have a better understanding as to what result we could normally expect from these chances!

Let’s try a new run of expected goals – one great chance (50%) against 10 poor chances (5%). Who wins most often here?

In [13]:
HomexG=[0.5]
AwayxG=[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
calculateChance(HomexG, AwayxG)
Over 10000 games, home wins 30.84%, away wins 23.14% and there is a draw in 46.02% of games.

Interestingly, the big chance team has a 5% advantage over the team that shoots loads from low-chance opportunities. Makes you think!

Summary

Creating random numbers is easy, whether we want a random percentage or number between 0 and 1 (.random()) or we want a random whole integer (randint()), the random module is a big help.

In this article, we saw how we can apply random numbers to a simulation. If anything around the function creation or for loops was confusing here, you might want to take a read up on those. Alternatively, why not push forward with more complex data sets?

Posted by FCPythonADMIN in Python Basics, 0 comments

Python Modules

Python is a versatile piece of kit straight out of the box. By itself it can do just about anything, from simple calculations, to automating a Twitter account, through to being a robot’s ‘brain’. However, we can make most jobs that we will do in Python a lot easier by using modules – sort of like Python add-on kits. These modules create groups of functions that make tasks quicker to write and perform. Without them, we’d have to write horribly lengthy code once we got beyond simple tasks.

To use a module, you must first install it onto your machine and then import it into any code that will use it. This article will take you through both steps:

Installing Modules

If you are using an Anaconda install of Python, the interface in Anaconda Navigator should have most modules that you are looking for available. Simply ensure that it is ticked and installed on your environment page.

If you are using another type of Python install, simply open up the terminal (your machine’s terminal, not Python) and run ‘pip install [MODULE NAME]’. Any issues that you run into at this point will be well-documented on Stack Overflow and Google, so give those a look if so.

Importing Modules

With our module installed on your machine and ready to go, we just need to import it. For this example, we’ll import the ‘math’ module, to give us access to the value for pi. Let’s see what happens without importing the module:

In [1]:
math.pi()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-9493078a23d9> in <module>()
----> 1 math.pi()

NameError: name 'math' is not defined
In [2]:
import math

print(math.pi)
3.141592653589793

Without importing the math module, Python obviously has no idea what it is. Once we import it, away we go!

With some modules, you will notice a convention to import them and give them different names. This is done with the ‘as’ keyword after our import:

In [5]:
import pandas as pd
import numpy as np

np.arange(0,10,2)
Out[5]:
array([0, 2, 4, 6, 8])

Summary

Harvard’s Introduction to Computer Science course opens with the discussion that we are ‘standing on the shoulders of giants’, highlighting the work of programmers before us that have built languages, modules and tools that allow us to be more productive.The Python community is a perfect example of this, with thousands of modules available to us that make complex tasks a bit more manageable.

See some of these modules in action across data analysis, visualisation and web scraping.

Posted by FCPythonADMIN in Python Basics, 0 comments