Random

How much does it cost to fill the Panini World Cup album? Simulations in Python

With the World Cup just 3 months away, the best bit of the tournament build up is upon us – the Panini sticker album.

For those looking to invest in a completed album to pass onto grandchildren, just how much will you have to spend to complete it on your own? Assuming that each sticker has an equal chance of being found, this is a simple random number problem that we can recreate in Python.

This article will show you how to create a function that allows you to estimate how much you will need to spend, before you throw wads of cash at sticker boxes to end with a half-finished album. Load up pandas and numpy and let’s kick on.

In [1]:
import pandas as pd
import numpy as np

To solve this, we are going to recreate our sticker album. It will be an empty list that will take on the new stickers that we find in each pack.

We will also need a few variables to act as counters alongside this list:

  • Stickers needed
  • How many packets have we bought?
  • How many swaps do we have?

Let’s define these:

In [1]:
stickersNeeded = 682
packetsBought = 0
stickersGot = []
swapStickers = 0

Now, we need to run a simulation that will open packs, check each sticker and either add it to our album or to our swaps pile.

We will do this by running a while loop that completes once the album is full.

This loop will open a pack of 5 stickers and check whether or not it is featured in the album already. To simulate the sticker, we will simply assign it a random number within the album. If this number is already present, we add it to the swap pile. If it is a new sticker, we append it to our album list.

We will also need to update our counters for packets bought, stickers needed and swaps throughout.

Pretty simple process overall! Let’s take a look at how we implement this loop:

In [2]:
while stickersNeeded > 0:
    
        #Buy a new packet
        packetsBought += 1

        #For each sticker, do some things 
        for i in range(0,5):
            
            #Assign the sticker a random number
            stickerNumber = np.random.randint(0,681)
    
            #Check if we have the sticker
            if stickerNumber not in stickersGot:
                
                #Add it to the album, then reduce our stickers needed count
                stickersGot.append(stickerNumber)
                stickersNeeded -= 1

            #Throw it into the swaps pile
            else:
                swapStickers += 1

Each time you run that, you are simulating the entire album completion process! Let’s check out the results:

In [3]:
{"Packets":packetsBought,"Swaps":swapStickers}
Out[3]:
{'Packets': 939, 'Swaps': 4013}

939 packets?! 4013 swaps?! Surely these must be outliers… let’s add all of this into one function and run it loads of times over.

As the number of stickers in a pack and the sticker total may change, let’s define these as arguments that we can change with future uses of the function:

In [4]:
def calculateAlbum(stickersInPack = 5, costOfPackp = 80, stickerTotal=682):
    stickersNeeded = stickerTotal
    packetsBought = 0
    stickersGot = []
    swapStickers = 0


    while stickersNeeded > 0:
        packetsBought += 1

        for i in range(0,stickersInPack):
            stickerNumber = np.random.randint(0,stickerTotal)

            if stickerNumber not in stickersGot:
                stickersGot.append(stickerNumber)
                stickersNeeded -= 1

            else:
                swapStickers += 1

    return{"Packets":packetsBought,"Swaps":swapStickers,
           "Total Cost":(packetsBought*costOfPackp)/100}
In [5]:
calculateAlbum()
Out[5]:
{'Packets': 1017, 'Swaps': 4403, 'Total Cost': 813.6}

So our calculateAlbum function does exactly the same as our instructions before, we have just added a total cost.

Let’s run this 1000 times over and see what we can truly expect if we want to complete the album:

In [6]:
a=0
b=0
c=0

for i in range(0, 1000):
    a += calculateAlbum()["Packets"]
    b += calculateAlbum()["Swaps"]
    c += calculateAlbum()["Total Cost"]

{"Packets":a/1000,"Swaps":b/1000,"Total Cost":c/1000}
Out[6]:
{'Packets': 969.582, 'Swaps': 4197.515, 'Total Cost': 773.4824}

970 packets, over 4000 swaps and the best part of £800 on the album. I think we’re going to need some people to swap with!

Of course, as you run these arguments, you will have different answers throughout. Hopefully here, however, our numbers are quite close together.

Summary

In this article, we have seen a basic example of running simulations with random numbers to answer a question.

We followed the process of replicating the album experience and running it once, then 1000 times to get an average expectation. As with any process involving random numbers, you will get different answers each time, so through running it loads of times over, we get an average that should remove the effect of any outliers.

We also designed our simulations to take on different parameters such as number of stickers needed, stickers in a pack, etc. This allows us to use the same functions when World Cup 2022 has twice the number of stickers!

For more examples of random numbers and simulations, check out our expected goals tutorial.

Posted by FCPythonADMIN in Blog

Random in Python with Expected Goals

Creating random numbers is a central part of programming – whether it is for simulations, games or models, there are a multitude of uses for random numbers. Python’s Random module makes creating pseudo-random numbers incredibly simple. This article runs through the basic commands to create a random number, then looks to apply this to finding how likely you are to win based on a match’s expected goals (we’ll come on to what these are later).

Let’s import the random module and get started.

In [1]:
import random

The easiest way to get a random number is through the ‘.random()’ operation. This will give a random number between 0 and 1. Check it out:

In [2]:
random.random()
Out[2]:
0.10769078951918487

A number between 0 and 1 is very useful. Essentially, it gives us a percentage that we can use to calculate chance, or to assign to a variable for calculation.

Additionally, we can use it to create a random whole number by multiplying it by the maximum possible value, then rounding with ‘int()’. The example here gives us a random number between 0 and 100:

In [3]:
int(random.random()*100)
Out[3]:
88

Alternatively to the above, we could use another feature of the Random module to create a random whole number for us – ‘.randint()’. We simply pass the highest and lowest possible values (inclusive) that we will allow. Let’s simulate a dice roll:

In [4]:
random.randint(1,6)
Out[4]:
2

Applying Random to Expected Goals

Great job on getting to know Random. The rest of the article applies it to expected goals, and will allow us to calculate how ‘lucky’ a team was based on the quality of their shots.

Firstly, expected goals is a measurement of how many goals a team could have expected to score based on the shots that they took. Different models base this on different things, but most commonly, the location of the shot, type of build-up, foot used are used to compare the chance with similar ones historically. We can then see the percentage chance of the shot becoming a goal.

Knowing the expected goals, we can then use ‘.random()’ to test how likely that score was. Let’s set up our lists of shots with their expected goal values – these are all percentages represented as decimals.

In [5]:
HomexG = [0.21,0.66,0.1,0.14,0.01]
AwayxG = [0.04,0.06,0.01,0.04,0.06,0.12,0.01,0.06]

The first shot for the home team had a 21% chance of being scored. Let’s create a random percentage to simulate if it goes in or not. If it is equal or less than 21%, we can say that it is scored in our simulation:

In [6]:
if random.random()<=0.21:
    print("GOAL!")
else:
    print("Missed!")
Missed!

As happens roughly 4 out of 5 times, this time the shot was missed. Let’s run the shot 10,000 times:

In [7]:
Goals = 0

for i in range(0,10000):
    if random.random()<=0.21:
        Goals += 1

print(Goals)
2071

So according to the xG score and our random test, if we take that shot 10,000 times, we can expect 2075 goals (pretty much in line with the 0.21 score). In a nutshell, this is how we simulate with random numbers.

Going Further: Simulating a Match with Expected Goals

Rather than simulate with one number, let’s apply this same test to every shot by the home and away teams. Take a look through the function below and try to understand how it applies the test above to every shot in the HomexG and AwayxG lists.

In [8]:
def calculateWinner(home, away):
    #Our match starts at 0-0
    HomeGoals = 0
    AwayGoals = 0
    
    #We have a function within our function
    #This one runs the '.random()' test above for a list
    def testShots(shots):
        
        #Start goal count at 0
        Goals = 0
        
        #For each shot, if it goes in, add a goal
        for shot in shots:
            if random.random() <= shot:
                Goals += 1
                
        #Finally, return the number of goals
        return Goals
    
    #Run the above formula for home and away lists
    HomeGoals = testShots(home)
    AwayGoals = testShots(away)
    
    #Return the score
    if HomeGoals > AwayGoals:
        print("Home Wins! {} - {}".format(HomeGoals, AwayGoals))
    elif AwayGoals > HomeGoals:
        print("Away Wins! {} - {}".format(HomeGoals, AwayGoals))
    else:
        print("Share of the points! {} - {}".format(HomeGoals, AwayGoals))
In [9]:
calculateWinner(HomexG, AwayxG)
Home Wins! 1 - 0

We are now simulating a whole game based on expected goals, pretty cool!

However, we are only simulating once. In order to get a proper estimate as to how likely it is that one team wins, we need to do this lots of times.

Let’s change our last function to simply return the result, not a user-friendly print out. We can then use this function over and over to calculate an accurate percentage chance of winning for each team.

In [10]:
def calculateWinner(home, away):
    HomeGoals = 0
    AwayGoals = 0
    
    def testShots(shots):
        Goals = 0
        
        for shot in shots:
            if random.random() <= shot:
                Goals += 1
        return Goals
    
    HomeGoals = testShots(home)
    AwayGoals = testShots(away)
    
    #This is all that changes from above
    #We now pass a simple string, rather than ask for a print out.
    if HomeGoals > AwayGoals:
        return("home")
    elif AwayGoals > HomeGoals:
        return("away")
    else:
        return("draw")

Now, let’s run this function 10000 times, and work out the percentage of each result:

In [11]:
#Run xG calculator 10000 times to test winner %
def calculateChance(team1, team2):
    home = 0;
    away = 0;
    draw = 0;
    
    for i in range(0,10000):
        matchWinner = calculateWinner(team1,team2)
        if matchWinner == "home":
            home +=1
        elif matchWinner == "away":
            away +=1
        else:
            draw +=1
    
    home = home/100
    away = away/100
    draw = draw/100
    
    print("Over 10000 games, home wins {}%, away wins {}% and there is a draw in {}% of games.".format(home, away, draw))
In [12]:
calculateChance(HomexG, AwayxG)
Over 10000 games, home wins 60.7%, away wins 10.16% and there is a draw in 29.14% of games.

There we go! We now have a better understanding as to what result we could normally expect from these chances!

Let’s try a new run of expected goals – one great chance (50%) against 10 poor chances (5%). Who wins most often here?

In [13]:
HomexG=[0.5]
AwayxG=[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
calculateChance(HomexG, AwayxG)
Over 10000 games, home wins 30.84%, away wins 23.14% and there is a draw in 46.02% of games.

Interestingly, the big chance team has a 5% advantage over the team that shoots loads from low-chance opportunities. Makes you think!

Summary

Creating random numbers is easy, whether we want a random percentage or number between 0 and 1 (.random()) or we want a random whole integer (randint()), the random module is a big help.

In this article, we saw how we can apply random numbers to a simulation. If anything around the function creation or for loops was confusing here, you might want to take a read up on those. Alternatively, why not push forward with more complex data sets?

Posted by FCPythonADMIN in Python Basics, 0 comments