Fantasy Football

Creating Interactive Charts with Plotly and Python

Python allows you to go beyond static visualisations with interactive graphics that allow you to present more information and get more engagement from your audience. Modules such as plotly and bokeh are the most accessible ways to create these and this article will introduce plotly scatter plots.

Specifcally, this article runs through creating plotly scatter plots if you are working with Python in Jupyter Notebooks. Check out the docs if you are looking to apply these elsewhere. However, the demo in this article will be more than enough to get you up and running with creating an interactive scatter plot that will get end-users engaged with your data!

Before we import plotly, let’s open up our dataset and see what we’re playing with.

In [1]:
import numpy as np
import pandas as pd

data = pd.read_csv("FantasyFootball.csv")
data.head()
Out[1]:
web_name team_code first_name second_name squad_number now_cost selected_by_percent total_points points_per_game minutes element_type total_points_p90 goals_p90 assists_p90 ga_p90
0 Ospina 3 David Ospina 13 4.8 0.2 0 0.0 0 1 0.000000 0.00000 0.0 0.00000
1 Cech 3 Petr Cech 33 5.4 4.9 84 3.7 2070 1 3.652174 0.00000 0.0 0.00000
2 Martinez 3 Damian Emiliano Martinez 26 4.0 0.6 0 0.0 0 1 0.000000 0.00000 0.0 0.00000
3 Koscielny 3 Laurent Koscielny 6 6.0 1.6 76 4.2 1595 2 4.288401 0.00000 0.0 0.00000
4 Mertesacker 3 Per Mertesacker 4 4.8 0.5 15 3.0 351 2 3.846154 0.25641 0.0 0.25641

Our dataset is a fantasy football dataset, with each player represented by a row. The row contains some biographical data, and per 90 performance data for goals, assists and points. Learn more about per 90 data in football here.

Now that we have met our data, let’s import the libraries needed to make our interactive visualisation. Check out the imports below, with comments explaining what each does

In [2]:
#Imports the tools needed to run plotly offline - usually plotly interacts with an online resource.
#These imports allow us to host everything from our computer
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf

#Allows us to plot our interactive charts in our Jupyter Notebook
init_notebook_mode(connected=True)
cf.go_offline()

All of these modules allow us to make plotting interactive charts very easy. To start with, we simply run the ‘.iplot()’ method from our dataframe. We then pass it a range of arguments to show exactly what we want:

  • Kind: The type of chart that we want. This time, we’ll use scatter, but there are many to choose from, such as histograms or line charts.
  • X & Y: The axes around which we will plot our data. Below, we’ll use points per 90 and cost.
  • Mode: Our type of scatter plot – we’ll use markers here to plot our data.
  • Text: What text should we show when we hover over a point?
  • Size: How big are our markers?
  • xTitle/yTitle/title: Axis and chart titles.

There are many other things that we could do, such as change fonts and colours, but for a first attempt, let’s see how this looks:

In [6]:
data.iplot(kind='scatter',x='ga_p90',y='now_cost',
           mode='markers',text='web_name',size=10,
          xTitle='Points per 90',yTitle='Cost',title='Cost vs Points p90')

Summary

I think that is pretty cool! We are showing the relationship and distribution – cost tends to go up as points per90 go up. We are also showing some potential value in players that overperform their price. Most impressively, however, users can hover over the chart to get the names of players and more information. This should attract buy-in to our graphics and inform better (& more neatly) than we could with static labels. Great job!

The offline version of plotly is much simpler than the full version, so take this as an introduction. If you are looking to learn more about interactive visualisation, take a read of the examples and docs at plotly and bokeh.

Posted by FCPythonADMIN in Visualisation

Calculating ‘per 90’ with Python and Fantasy Football

When we are comparing data between players, it is very important that we standardise their data to ensure that each player has the same ‘opportunity’ to show their worth. The simplest way for us to do this, is to ensure that all players have the same amount of time within which to play. One popular way of doing this in football is to create ‘per 90’ values. This means that we will change our total amounts of goals, shots, etc. to show how many a player will do every 90 minutes of football that they play. This article will run through creating per 90 figures in Python by applying them to fantasy football points and data.

Follow the examples along below and feel free to use them where you are. Let’s get started by importing our modules and taking a look at our data set.

In [1]:
import numpy as np
import pandas as pd

data = pd.read_csv("../Data/Fantasy_Football.csv")
data.head()
Out[1]:
web_name team_code first_name second_name squad_number now_cost dreamteam_count selected_by_percent total_points points_per_game penalties_saved penalties_missed yellow_cards red_cards saves bonus bps ict_index element_type team
0 Ospina 3 David Ospina 13 48 0 0.2 0 0.0 0 0 0 0 0 0 0 0.0 1 1
1 Cech 3 Petr Cech 33 54 0 4.9 84 3.7 0 0 1 0 53 4 419 42.7 1 1
2 Martinez 3 Damian Emiliano Martinez 26 40 0 0.6 0 0.0 0 0 0 0 0 0 0 0.0 1 1
3 Koscielny 3 Laurent Koscielny 6 60 2 1.6 76 4.2 0 0 3 0 0 14 421 62.5 2 1
4 Mertesacker 3 Per Mertesacker 4 48 1 0.5 15 3.0 0 0 0 0 0 2 77 15.7 2 1

5 rows × 26 columns

Our data has a host of data on our players’ fantasy football performance. We have their names, of course, and also their points and contributing factors (goals, clean sheets, etc.). Crucially, we have the players’ minutes played – allowing us to calculate their per 90 figures for the other variables.

Calculating our per 90 numbers is reasonably simple, we just need to find out how many 90 minute periods our player has played, then divide the variable by this value. The function below will show this step-by-step and show Kane’s goals p90 in the Premier League at the time of writing (goals = 20, minutes = 1868):

In [2]:
def p90_Calculator(variable_value, minutes_played):
    
    ninety_minute_periods = minutes_played/90
    
    p90_value = variable_value/ninety_minute_periods
    
    return p90_value

p90_Calculator(20, 1868)
Out[2]:
0.9635974304068522

There we go, Kane scores 0.96 goals per 90 in the Premier League! Our code, while explanatory is three lines long, when it can all be in one line. Let’s try again, and check that we get the same value:

In [3]:
def p90_Calculator(value, minutes):
    return value/(minutes/90)

p90_Calculator(20, 1868)
Out[3]:
0.9635974304068522

Great job! The code has the same result, in a third of the lines, and I still think it is fairly easy to understand.

Next up, we need to apply this to our dataset. Pandas makes this easy, as we can simply call a new column, and run our command with existing columns as arguments:

In [4]:
data["total_points_p90"] = p90_Calculator(data.total_points,data.minutes)
data.total_points_p90.fillna(0, inplace=True)
data.head()
Out[4]:
web_name team_code first_name second_name squad_number now_cost dreamteam_count selected_by_percent total_points points_per_game penalties_missed yellow_cards red_cards saves bonus bps ict_index element_type team total_points_p90
0 Ospina 3 David Ospina 13 48 0 0.2 0 0.0 0 0 0 0 0 0 0.0 1 1 0.000000
1 Cech 3 Petr Cech 33 54 0 4.9 84 3.7 0 1 0 53 4 419 42.7 1 1 3.652174
2 Martinez 3 Damian Emiliano Martinez 26 40 0 0.6 0 0.0 0 0 0 0 0 0 0.0 1 1 0.000000
3 Koscielny 3 Laurent Koscielny 6 60 2 1.6 76 4.2 0 3 0 0 14 421 62.5 2 1 4.288401
4 Mertesacker 3 Per Mertesacker 4 48 1 0.5 15 3.0 0 0 0 0 2 77 15.7 2 1 3.846154

5 rows × 27 columns

And there we have a total points per 90 column, which will hopefully offer some more insight than a simple points total. Let’s sort our values and view the top 5 players:

In [5]:
data.sort_values(by='total_points_p90', ascending =False).head()
Out[5]:
web_name team_code first_name second_name squad_number now_cost dreamteam_count selected_by_percent total_points points_per_game penalties_missed yellow_cards red_cards saves bonus bps ict_index element_type team total_points_p90
271 Tuanzebe 1 Axel Tuanzebe 38 39 0 1.7 1 1.0 0 0 0 0 0 3 0.0 2 12 90.0
322 Sims 20 Joshua Sims 39 43 0 0.1 1 1.0 0 0 0 0 0 3 0.0 3 14 90.0
394 Janssen 6 Vincent Janssen 9 74 0 0.1 1 1.0 0 0 0 0 0 2 0.0 4 17 90.0
166 Hefele 38 Michael Hefele 44 42 0 0.1 1 1.0 0 0 0 0 0 4 0.4 2 8 90.0
585 Silva 13 Adrien Sebastian Perruchet Silva 14 60 0 0.0 1 1.0 0 0 0 0 0 5 0.3 3 9 22.5

5 rows × 27 columns

Huh, probably not what we expected here… players with 1 point, and some surprisng names too. Upon further examination, these players suffer from their sample size. They’ve played very few minutes, so their numbers get overly inflated… there’s obviously no way a player gets that many points per 90!

Let’s set a minimum time played to our data to eliminate players without a big enough sample:

In [6]:
data.sort_values(by='total_points_p90', ascending =False)[data.minutes>400].head(10)[["web_name","total_points_p90"]]
Out[6]:
web_name total_points_p90
233 Salah 9.629408
279 Martial 8.927126
246 Sterling 8.378721
225 Coutinho 8.358882
325 Austin 8.003356
278 Lingard 7.951807
544 Niasse 7.460317
256 Agüero 7.346939
389 Son 7.288503
255 Bernardo Silva 7.119403

That seems a bit more like it! We’ve got some of the highest scoring players here, like Salah and Sterling, but if Austin, Lingard and Bernardo Silva can nail down long-term starting spots, we should certainly keep an eye on adding them in!

Let’s go back over this by creating a new column for goals per 90 and finding the top 10:

In [7]:
data["goals_p90"] = p90_Calculator(data.goals_scored,data.minutes)
data.goals_p90.fillna(0, inplace=True)
data.sort_values(by='goals_p90', ascending =False)[data.minutes>400].head(10)[["web_name","goals_p90"]]
Out[7]:
web_name goals_p90
233 Salah 0.968320
393 Kane 0.967222
325 Austin 0.906040
256 Agüero 0.823364
246 Sterling 0.797973
544 Niasse 0.793651
279 Martial 0.728745
258 Jesus 0.714995
278 Lingard 0.632530
160 Rooney 0.630252

Great job! Hopefully you can see that this is a much fairer way to rate our player data – whether for performance, fantasy football or media reporting purposes.

Summary

p90 data is a fundamental concept of football analytics. It is one of the first steps of cleaning our data and making it fit for comparisons. This article has shown how we can apply the concept quickly and easily to our data. For next steps, you might want to take a look at visualising this data, or looking at further analysis techniques.

Posted by FCPythonADMIN in Blog

Exploratory Python Data Visualisation with Pairplot

Python’s data visualisation libraries are great for exploratory and descriptive data analysis. When you have a new dataset, you may want to look at relationships en masse and then drilldown into something that you find particularly interesting. Python’s Seaborn module’s ‘.pairplot’ is one way to carry out your initial look at your data. This example takes a look at a few columns from a fantasy footall dataset, edited from here).

Plug in our modules, fire up the dataset and see what we’re dealing with.

In [1]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import pandas as pd
import numpy as np
In [2]:
data = pd.read_csv("../../Data/Fantasy_Football.csv")
data.head()
Out[2]:
web_name team name first_name second_name squad_number now_cost selected_by_percent total_points points_per_game minutes bonus
0 Ospina Arsenal David Ospina 13 48 0.2 0 0.0 0 0
1 Cech Arsenal Petr Cech 33 54 5.2 63 3.9 1440 4
2 Martinez Arsenal Damian Emiliano Martinez 26 40 0.6 0 0.0 0 0
3 Koscielny Arsenal Laurent Koscielny 6 60 1.4 48 3.7 1164 6
4 Mertesacker Arsenal Per Mertesacker 4 48 0.5 14 3.5 333 2

So we have a row for each player, containing their names, team and some numerical data including squad number, cost, selection and points.

These numbers, while readable individually, are impossible to read and make much use out of beyond a line-by-line understanding. Seaborn’s ‘.pairplot()’ allows us to take in a huge amount of data and see any relationships and the spread of each data point. It will take each numerical column, put them on both the x and y axes and plot a a scatter plot where they meet. Where the same variables meet, we get a histogram that shows the distribution of our variables. Let’s check out the default plot:

In [3]:
sns.pairplot(data)
plt.show()

So this is a lot of data to look at. While it is very useful, it can be quite overwhelming. Let’s use the ‘vars’ argument within pairplot to focus on a few variables.

We’ll also change our scatterplot to a regression type with ‘kind’, so that we can see the regression model that Seaborn would create if we were to use a reg plot. Now we’ll be able to better see any relationships:

In [4]:
sns.pairplot(data, vars=["now_cost","selected_by_percent","total_points"],  kind="reg")
plt.show()

That looks much more manageable! See how easy it is to create a complicated plot, that tells us a lot about our data very quickly? We can now see that most players are picked by nobody/very few people, and that the clearest relationship is between popularity and points – as we’d probably expect. Perhaps less predictably, the relationship between points and cost is comparatively weak.

Summary

This sets us up for a more comprehensive look at fantasy football, but hopefully this article goes to show how easy it can be to knock together an exploratory data visualisation with Seaborn’s pairplot. There are many more arguments that we could pass to improve this, from the colour (“hue=’position’, for example), to other types of plots within our pairplot. Take a look at the docs to find out all of your options.

After your exploratory analysis, you might want to check out our describing datasets article to go further!

Posted by FCPythonADMIN in Visualisation