When we are comparing data between players, it is very important that we standardise their data to ensure that each player has the same ‘opportunity’ to show their worth. The simplest way for us to do this, is to ensure that all players have the same amount of time within which to play. One popular way of doing this in football is to create ‘per 90’ values. This means that we will change our total amounts of goals, shots, etc. to show how many a player will do every 90 minutes of football that they play. This article will run through creating per 90 figures in Python by applying them to fantasy football points and data.

Follow the examples along below and feel free to use them where you are. Let’s get started by importing our modules and taking a look at our data set.

In [1]:

import numpy as np
import pandas as pd

data = pd.read_csv("../Data/Fantasy_Football.csv")
data.head()

Out[1]:

	web_name	team_code	first_name	second_name	squad_number	now_cost	dreamteam_count	selected_by_percent	total_points	points_per_game	…	yellow_cards	saves	bonus	bps	ict_index	element_type	team
0	Ospina	3	David	Ospina	13	48	0	0.2	0	0.0	…	0	0	0	0	0.0	1	1
1	Cech	3	Petr	Cech	33	54	0	4.9	84	3.7	…	1	53	4	419	42.7	1	1
2	Martinez	3	Damian Emiliano	Martinez	26	40	0	0.6	0	0.0	…	0	0	0	0	0.0	1	1
3	Koscielny	3	Laurent	Koscielny	6	60	2	1.6	76	4.2	…	3	0	14	421	62.5	2	1
4	Mertesacker	3	Per	Mertesacker	4	48	1	0.5	15	3.0	…	0	0	2	77	15.7	2	1

5 rows × 26 columns

Our data has a host of data on our players’ fantasy football performance. We have their names, of course, and also their points and contributing factors (goals, clean sheets, etc.). Crucially, we have the players’ minutes played – allowing us to calculate their per 90 figures for the other variables.

Calculating our per 90 numbers is reasonably simple, we just need to find out how many 90 minute periods our player has played, then divide the variable by this value. The function below will show this step-by-step and show Kane’s goals p90 in the Premier League at the time of writing (goals = 20, minutes = 1868):

In [2]:

def p90_Calculator(variable_value, minutes_played):
    
    ninety_minute_periods = minutes_played/90
    
    p90_value = variable_value/ninety_minute_periods
    
    return p90_value

p90_Calculator(20, 1868)

Out[2]:

0.9635974304068522

There we go, Kane scores 0.96 goals per 90 in the Premier League! Our code, while explanatory is three lines long, when it can all be in one line. Let’s try again, and check that we get the same value:

In [3]:

def p90_Calculator(value, minutes):
    return value/(minutes/90)

p90_Calculator(20, 1868)

Out[3]:

0.9635974304068522

Great job! The code has the same result, in a third of the lines, and I still think it is fairly easy to understand.

Next up, we need to apply this to our dataset. Pandas makes this easy, as we can simply call a new column, and run our command with existing columns as arguments:

In [4]:

data["total_points_p90"] = p90_Calculator(data.total_points,data.minutes)
data.total_points_p90.fillna(0, inplace=True)
data.head()

Out[4]:

	web_name	team_code	first_name	second_name	squad_number	now_cost	dreamteam_count	selected_by_percent	total_points	points_per_game	…	yellow_cards	saves	bonus	bps	ict_index	element_type	team	total_points_p90
0	Ospina	3	David	Ospina	13	48	0	0.2	0	0.0	…	0	0	0	0	0.0	1	1	0.000000
1	Cech	3	Petr	Cech	33	54	0	4.9	84	3.7	…	1	53	4	419	42.7	1	1	3.652174
2	Martinez	3	Damian Emiliano	Martinez	26	40	0	0.6	0	0.0	…	0	0	0	0	0.0	1	1	0.000000
3	Koscielny	3	Laurent	Koscielny	6	60	2	1.6	76	4.2	…	3	0	14	421	62.5	2	1	4.288401
4	Mertesacker	3	Per	Mertesacker	4	48	1	0.5	15	3.0	…	0	0	2	77	15.7	2	1	3.846154

5 rows × 27 columns

And there we have a total points per 90 column, which will hopefully offer some more insight than a simple points total. Let’s sort our values and view the top 5 players:

In [5]:

data.sort_values(by='total_points_p90', ascending =False).head()

Out[5]:

	web_name	team_code	first_name	second_name	squad_number	now_cost	selected_by_percent	total_points	points_per_game	…	bps	ict_index	element_type	team	total_points_p90
271	Tuanzebe	1	Axel	Tuanzebe	38	39	1.7	1	1.0	…	3	0.0	2	12	90.0
322	Sims	20	Joshua	Sims	39	43	0.1	1	1.0	…	3	0.0	3	14	90.0
394	Janssen	6	Vincent	Janssen	9	74	0.1	1	1.0	…	2	0.0	4	17	90.0
166	Hefele	38	Michael	Hefele	44	42	0.1	1	1.0	…	4	0.4	2	8	90.0
585	Silva	13	Adrien Sebastian	Perruchet Silva	14	60	0.0	1	1.0	…	5	0.3	3	9	22.5

5 rows × 27 columns

Huh, probably not what we expected here… players with 1 point, and some surprisng names too. Upon further examination, these players suffer from their sample size. They’ve played very few minutes, so their numbers get overly inflated… there’s obviously no way a player gets that many points per 90!

Let’s set a minimum time played to our data to eliminate players without a big enough sample:

In [6]:

data.sort_values(by='total_points_p90', ascending =False)[data.minutes>400].head(10)[["web_name","total_points_p90"]]

Out[6]:

	web_name	total_points_p90
233	Salah	9.629408
279	Martial	8.927126
246	Sterling	8.378721
225	Coutinho	8.358882
325	Austin	8.003356
278	Lingard	7.951807
544	Niasse	7.460317
256	Agüero	7.346939
389	Son	7.288503
255	Bernardo Silva	7.119403

That seems a bit more like it! We’ve got some of the highest scoring players here, like Salah and Sterling, but if Austin, Lingard and Bernardo Silva can nail down long-term starting spots, we should certainly keep an eye on adding them in!

Let’s go back over this by creating a new column for goals per 90 and finding the top 10:

In [7]:

data["goals_p90"] = p90_Calculator(data.goals_scored,data.minutes)
data.goals_p90.fillna(0, inplace=True)
data.sort_values(by='goals_p90', ascending =False)[data.minutes>400].head(10)[["web_name","goals_p90"]]

Out[7]:

	web_name	goals_p90
233	Salah	0.968320
393	Kane	0.967222
325	Austin	0.906040
256	Agüero	0.823364
246	Sterling	0.797973
544	Niasse	0.793651
279	Martial	0.728745
258	Jesus	0.714995
278	Lingard	0.632530
160	Rooney	0.630252

Great job! Hopefully you can see that this is a much fairer way to rate our player data – whether for performance, fantasy football or media reporting purposes.

Summary

p90 data is a fundamental concept of football analytics. It is one of the first steps of cleaning our data and making it fit for comparisons. This article has shown how we can apply the concept quickly and easily to our data. For next steps, you might want to take a look at visualising this data, or looking at further analysis techniques.