matplotlib

Convex Hulls for Football in Python

Building on what you can do with event data from the Opta (or any other) event feed, we’re going to look at one way of visualising a team’s defensive actions. Popularised in the football analytics community by Thom Lawrence (please let us know if we should add anyone else!), convex hulls display the smallest area needed to cover a set of points:

In this tutorial, we’re going to go through selecting and preparing our data to create these, before plotting the hull. We’ll then apply this to a for loop to chart each player together to see where a team is being forced to defend.

For this article, we’ll be making use of the ConvexHull tools within the Scipy module. The wider module is a phenomenal resource for more complex maths needs in Python, so give it a look if you’re interested.

Outside of ConvexHull, we’ll need pandas and numpy for importing and manipulating data, while Matplotlib will plot our data. Let’s import them and get started:

In [1]:
from scipy.spatial import ConvexHull

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from matplotlib.patches import Arc


%matplotlib inline

With the modules ready, we’re going to import our data. For this example, our data contains all defensive actions in one match, split by player and team.

Let’s take a look at how it is structured with .head():

In [2]:
defdata = pd.read_csv("def_table.csv")
defdata.head()
Out[2]:
player team minute x y outcome
0 50471 Team A 1 38.9 31.8 1
1 19197 Team A 6 52.6 68.4 1
2 42593 Team B 6 39.8 83.9 1
3 19188 Team A 7 3.5 37.9 1
4 82403 Team A 8 17.9 98.5 1

So each row is a defensive action, and we can see the x/y coordinates and who did it.

We just want one player’s actions, so we’ll create a new dataframe for the first player ID – 50471:

In [3]:
player50471 = defdata.loc[(defdata['player'] == 50471)]

player50471.head()
Out[3]:
player team minute x y outcome
0 50471 Team A 1 38.9 31.8 1
12 50471 Team A 22 30.0 33.2 1
13 50471 Team A 25 64.7 94.9 1
51 50471 Team A 65 31.2 32.2 1
56 50471 Team A 72 46.5 22.6 1

To create a convex hull, we need to build it from a list of coordinates. We have our coordinates in the dataframe already, but need them to look something close to the below:

(38.9, 31.8), (30.0, 33.2), (64.7, 94.9) and so on…

Thanks to the pandas module, this is made easy by adding .values to the end of the data that we want to see in arrays, rather than columns:

In [4]:
defpoints = player50471[['x', 'y']].values

defpoints
Out[4]:
array([[38.9, 31.8],
       [30. , 33.2],
       [64.7, 94.9],
       [31.2, 32.2],
       [46.5, 22.6],
       [30.3, 49.8],
       [22.9, 92.5]])

Our data is now ready to be used to create our convex hull. By itself, it is actually pretty boring – it simply creates an object that does nothing at all by itself. Let’s see how this is done below:

In [5]:
#Create a convex hull object and assign it to the variable hull
hull = ConvexHull(player50471[['x','y']])

#Display hull
hull
Out[5]:
<scipy.spatial.qhull.ConvexHull at 0x1faa0c96dd8>

See, that is pretty boring. But we can make it so much cooler when we plot the hull onto a chart.

Let’s start by plotting all 7 event locations as dots on a scatter chart:

In [6]:
#Plot the X & Y location with dots
plt.plot(player50471.x,player50471.y, 'o')
Out[6]:
[<matplotlib.lines.Line2D at 0x1faa2d10908>]

Basic Scatter Plot

Next up, we’re going to add lines around the most extreme parts of the plot. These most extreme parts are stored in a part of the hull object called simplices. We can just use a for loop to iterate through the simplices and draw lines between them:

In [7]:
#Plot the X & Y location with dots
plt.plot(player50471.x,player50471.y, 'o')

#Loop through each of the hull's simplices
for simplex in hull.simplices:
    #Draw a black line between each
    plt.plot(defpoints[simplex, 0], defpoints[simplex, 1], 'k-')

Convex Hull around Plots

Looks kind of abstract, but a lot more interesting than the hull object on its own!

Let’s just add in some shading to make our area even clearer. We’ll also make it 30% transparent with the alpha argument:

In [8]:
#Plot the X & Y location with dots
plt.plot(player50471.x,player50471.y, 'o')

#Loop through each of the hull's simplices
for simplex in hull.simplices:
    #Draw a black line between each
    plt.plot(defpoints[simplex, 0], defpoints[simplex, 1], 'k-')
    
#Fill the area within the lines that we have drawn
plt.fill(defpoints[hull.vertices,0], defpoints[hull.vertices,1], 'k', alpha=0.3)
Out[8]:
[<matplotlib.patches.Polygon at 0x1faa2f1bb70>]

Shaded Convex Hull

Perfect, we have one player’s zone of defensive actions plotted. We don’t have a pitch or any other players on there yet, but this is great work!

Let’s work on a bigger project now – let’s do all of this over and over for a whole team. We’ll take a single team out of our dataset, then use for loops to create the plot for each player (exactly as above) before plotting them together.

First up, let’s extract Team B into one dataframe:

In [9]:
TeamB = defdata.loc[(defdata.team == "Team B")]
TeamB.head()
Out[9]:
player team minute x y outcome
2 42593 Team B 6 39.8 83.9 1
5 42593 Team B 8 44.7 91.5 1
6 17476 Team B 12 23.1 1.3 1
8 57112 Team B 17 4.4 57.7 1
9 42593 Team B 17 5.8 58.9 1

Perfect, just as before, but with different players on a single team.

We’ll now need to go through each player and do exactly what we did to plot just a single player. First up, we need to find out who we are dealing with. We can use .unique() to pool each individual into the variable ‘players’:

In [10]:
players = TeamB["player"].unique()
players
Out[10]:
array([42593, 17476, 57112, 27789, 14664, 61366, 37748, 57001, 28554,
       17740], dtype=int64)

Every player now just needs to go into a for loop, where we’ll do exactly what we did before to get a plot. We’ll create a temporary dataframe for each player, create a hull from the x/y coordinates, then plot the lines and fill in the shape with a transparent colour. Let’s take a look with the help of some comments:

In [11]:
#For each player in our players variable
for player in players:
    
    #Create a new dataframe for the player
    df = TeamB[(TeamB.player == player)]
    
    #Create an array of the x/y coordinate groups
    points = df[['x', 'y']].values

    #If there are enough points for a hull, create it. If there's an error, forget about it
    try:
        hull = ConvexHull(df[['x','y']])
        
    except:
        pass
    
    #If we created the hull, draw the lines and fill with 5% transparent red. If there's an error, forget about it
    try:     
        for simplex in hull.simplices:
            plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
            plt.fill(points[hull.vertices,0], points[hull.vertices,1], 'red', alpha=0.05)
                       
    except:
        pass
    
#Once all of the individual hulls have been created, plot them together
plt.show()

Multiple Shaded Convex Hulls

Fantastic work! We now have all of the players with enough data points on the chart. The transparency is a nice touch, as we can see any hidden players and where any crossover happens.

Our plot leaves out any players with less than 2 defensive actions in the data, so you may want to plot these as lines or dots. If so, you should be able to figure out how to do this from the code already, or from our other visualisation tutorials.

As for next steps, you might want to plot this on a pitch (pitch drawing tutorial here):

Shaded Hulls on Pitch

So now we can see where our team are performing their defensive actions – although remember a few players are missing. In terms of analysis, does this suggest that this team defends better on the left? Or is it more likely that they faced a team that largely attacked on that side? Visualisation is just one small piece of any analysis!

Summary

In this tutorial, we have practiced filtering a dataframe by player or team, then using SciPy’s convex hull tool to create the data for plotting the smallest area that contains our datapoints.

Some nice extensions to this that you may want to play with include adding some annotations for player names, or changing colours for each player. Of course, these charts aren’t limited to defensive metrics – why not take a look at penalty area entry pass zones, or compare goalkeeper distributions? However you build on this work, show us what you’re achieving on Twitter @FC_Python!

Find further visualisation tutorials here!

Posted by FCPythonADMIN in Visualisation

Visualising Running Totals with Line Charts

Cumulative line charts feature in loads of great and popular visualisations across the football analytics community. Most commonly, they are seen in xG or shot counts throughout a game. In the example from Ben Mayhew below, we can see how a great visualisation gives us so much more detail than a total xG figure would. It gives us periods of dominance from a team and the spread for both teams over a game:

Further examples from Statsbomb again looking at cumulative xG and from FT’s John Burn-Murdoch comparing prolific batsmen.

This tutorial will take us through creating a cumulative line chart for points throughout a season. We’re going to take the following steps to get to our visualisation:

1) Import our data
2) Transform it into a usable format
3) Put our data into a basic visualisation
4) Style our visualisation

Let’s get our libraries into place and get started.

In [1]:
# libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline

1) Import our data

Data for this tutorial comes from http://football-data.co.uk/ – a great resource for match-by-match results data from a number of global leagues. Feel free to use any of the leagues provided to follow along with.

Download a csv of a season’s worth of matches (or part of an ongoing season) and put it in the file with your script.

After that, import your data with pandas and assign it to a dataframe:

In [2]:
#Import our data and assign it to 'data'
data = pd.read_csv("1718EPL.csv")

#Show the top of the dataframe
data.head()
Out[2]:
Div Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR BbAv<2.5 BbAH BbAHh BbMxAHH BbAvAHH BbMxAHA BbAvAHA PSCH PSCD PSCA
0 E0 11/08/17 Arsenal Leicester 4 3 H 2 2 D 2.32 21 -1.00 1.91 1.85 2.10 2.02 1.49 4.73 7.25
1 E0 12/08/17 Brighton Man City 0 2 A 0 0 D 2.27 20 1.50 1.95 1.91 2.01 1.96 11.75 6.15 1.29
2 E0 12/08/17 Chelsea Burnley 2 3 A 0 3 A 2.23 20 -1.75 2.03 1.97 1.95 1.90 1.33 5.40 12.25
3 E0 12/08/17 Crystal Palace Huddersfield 0 3 A 0 2 A 1.72 18 -0.75 2.10 2.05 1.86 1.83 1.79 3.56 5.51
4 E0 12/08/17 Everton Stoke 1 0 H 1 0 H 1.76 19 -0.75 1.94 1.90 2.01 1.98 1.82 3.49 5.42

5 rows × 65 columns

2) Transform our data into a usable format

Our data is a match-by-match look at a season and this won’t help us much for a line chart. We need our data to be the data that we want to plot – a list of the cumulative totals for each team over the season.

Ideally, we will do this for every team altogether, rather than one team at a time. So let’s firstly create a list of the unique teams in our dataframe.

We’ll then create a dictionary that will iterate over our teams and give each a list that starts with a 0, as each team obviously starts with 0 points.

In [3]:
#Create a list of unique teams from the home team column
Teams = data.HomeTeam.unique()

#Create a dictionary called TeamLists. There will be an entry for each team with the list [0]
TeamLists = {Team : [0] for Team in Teams}

With a starter list ready for each team, we just need to run through each match, find out who won and add a new entry into the correct team’s list with their points.

Let’s do this by working through each line of our dataframe, learning who the home team and away team are, then running an if statement to learn the result. Once we know the result, we can add each team’s points with the append method:

In [4]:
#For each row in our dataframe, I want to do the following:
for row in data.itertuples():
    #Add the home and away team names to the correct variable
    Home = row.HomeTeam
    Away = row.AwayTeam
    
    #If the home team goals (FTHG column in the dataframe) are higher than the away team, give the correct points to each team
    if row.FTHG > row.FTAG:
        TeamLists[Home].append(3)
        TeamLists[Away].append(0)
    #If the home team goals are less than the away team, give the correct points
    elif row.FTHG < row.FTAG:
        TeamLists[Home].append(0)
        TeamLists[Away].append(3)
    #In any other case (a draw), give the correct points
    else:
        TeamLists[Home].append(1)
        TeamLists[Away].append(1)

We have stored the lists inside the TeamLists dictionary, so let’s check out the Arsenal entry in there.

In [5]:
TeamLists["Arsenal"]
Out[5]:
[0, 3, 0,...3, 0, 3, 0, 3]

Ah, we have just appended the points, but done nothing to run these as cumulative totals throughout the season.

To achieve this, we somehow need to access the previous game and just add our result to this. Our lists tutorial goes through accessing certain values, but we can navigate backwards through a list with a negative value in square brackets, e.g. myList[-1].

So let’s reset our Teams and TeamLists variable so that they do not contain our previous data. With that all cleaned up, we can repeat our for loop above, but instead of appending the points – we will append the sum of points and the previous value.

In [6]:
Teams = data.HomeTeam.unique()
TeamLists = {Team : [0] for Team in Teams}

for row in data.itertuples():
    Home = row.HomeTeam
    Away = row.AwayTeam
    
    if row.FTHG > row.FTAG:
        TeamLists[Home].append(TeamLists[Home][-1]+3)
        TeamLists[Away].append(TeamLists[Away][-1]+0)
    elif row.FTHG < row.FTAG:
        TeamLists[Home].append(TeamLists[Home][-1]+0)
        TeamLists[Away].append(TeamLists[Away][-1]+3)
    else:
        TeamLists[Home].append(TeamLists[Home][-1]+1)
        TeamLists[Away].append(TeamLists[Away][-1]+1)

Let’s check out Arsenal again – hopefully this makes more sense as a running total.

In [7]:
TeamLists["Arsenal"]
Out[7]:
[0, 3, 3,...57, 57, 60, 60, 63]

Perfect! Let’s get onto putting this into an easy visualisation:

3) Put our data into a basic viz

Matplotlib makes it ridiculously simple to create a line chart. The .plot function ideally takes at least 2 arguments, the x and y location of each point on the line. The points provide one of the coordinates of each point, we just need to create a list containing numbers 0-38 for our matchdays (0 is the starting point).

We can do this by using the range function within the list function. For this, range needs two numbers, the starting number and the end number + 1:

In [8]:
Matchday = list(range(0,39))

Now let’s take advantage of matplotlib’s beautifully easy plotting, by using .plot along with our matchday and team lists:

In [9]:
#Create a line plot with matchday and teamlist figures for two teams
plt.plot(Matchday, TeamLists["Southampton"])
plt.plot(Matchday, TeamLists["Swansea"])
Out[9]:
[<matplotlib.lines.Line2D at 0x1ad48ed3828>]

Python Line Chart

Plenty that needs to be done to improve this, but a really solid start!

4) Styling the visualisation

Just like any default visualisation, the style will have been seen a million times. Whether you’re following along here, or creating visualisations in Excel or elsewhere, it is a great idea to get a clean style that identifies as your own.

For this visualisation, we’re just going to set a few titles and change colours/weights of our lines. You can find a few more visualisation tricks and tips here (https://fcpython.com/blog/making-better-visualisations).

In [10]:
#Create the bare bones of what will be our visualisation
fig, ax = plt.subplots()

#Add our data as before, but setting colours and widths of lines
plt.plot(Matchday, TeamLists["Man City"], color = "#6CABDD", linewidth=2)
plt.plot(Matchday, TeamLists["Swansea"], color = "#231F20", linewidth=2)

#Give the axes and plot a title each
plt.xlabel('Gameweek')
plt.ylabel('Points')
plt.title('Man City v Swansea Running Points')

#Add a faint grey grid
plt.grid()
ax.xaxis.grid(color = "#F8F8F8")
ax.yaxis.grid(color = "#F9F9F9")

#Remove the margins between our lines and the axes
plt.margins(x=0,y=0)

#Remove the spines of the chart on the top and right sides
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

Line chart with styling

And there we go, a clean look at two teams’ running points over the season. Still loads that we could change to improve it, such as new fonts, add all other teams in greyed out lines or add some data labels. Give it a go and take a look through the documentation/Google when you get stuck!

In this tutorial, we have seen how to take a match-by-match dataset and transform it into a format that allows for a line chart with the running total. You can apply the same logic to any metric throughout a match, a different variable through a season or even a single player’s running goals total throughout a career.

Find our other visualisation tutorials here, and show us what you come up with @FC_Python!

Posted by FCPythonADMIN in Visualisation

Joyplots in Python with Joypy

Joyplots are a way for us to show lots of density plots in one chart, while also adding a category that we can differentiate by. They are quite fashionable at present and have allowed for some beautiful graphics. Python’s joypy library, building on matplotlib, gives us the opportunity to create our very own joyplots in just a few lines of code. In this article, we’ll give a tutorial into creating the plots and customising them by plotting the top 50 transfer values of each year since 1991. Hopefully we’ll get a small insight into the trends of the biggest moves in the modern game.

Let’s get our modules in place and take a look at our dataset:

In [1]:
from __future__ import unicode_literals
import joypy
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import cm
%matplotlib inline

df = pd.read_csv("top50.csv")

df.head()
Out[1]:
Player Value Year
0 David Platt 7 1991
1 Robert Prosinecki 7 1991
2 Thomas Häßler 6 1991
3 Jürgen Kohler 6 1991
4 Thomas Doll 6 1991

Our dataset has 1350 observations, each containing one of the most expensive 50 transfers for the years 1991 through 2017.

When we create a joyplot, we should specify what category we want to differentiate. In this case, we will categorise by the ‘Year’ column, and we use the ‘by’ argument in ‘joypy.joyplot()’ to do this.

We should also tell it which column we want the density plots to draw. Here, it will obviously be the ‘value’ column – passed on through the ‘column’ argument.

Let’s take a look at the default:

In [2]:
fig, axes = joypy.joyplot(df, by="Year", column="Value",figsize=(5,8))
plt.show()

It’s a start! Certainly not the best-looking plot, but we’ve got something.

As you can see, each plot-line is a year, with the graphic showing the values. The plot appears needlessly wide due to Neymar’s transfer forcing us to draw beyond 200m€. Is there anything in football oil money won’t affect?!

What can we learn from the chart? Obviously, players have gotten more expensive as time goes on – you certainly don’t need a chart to tell you that. But we can also see that the variation between values in the top 50 of each year has become much more spread out – we even see some years where the trend hasn’t been growth.

I don’t doubt that there are better ways to plot this, but hopefully we can make something fairly good-looking to justify it.

Let’s through in some customisation:

In [9]:
fig, axes = joypy.joyplot(df[df.Player != 'Neymar'], by="Year", column="Value",figsize=(5,8),
             linewidth=0.05,overlap=3,colormap=cm.summer_r,x_range=[0,110])

plt.text(40, 0.8, "Top 50 transfer values (€m) \n 1991-2017",fontsize=12)

plt.show()

In this plot, we have used a subset of our dataset – excluding Neymar, to try and get a better handle of the rest of the shapes in the chart.

Let’s run through the other changes we’ve made:

  • We’ve set a very thin line-width, allowing us to see the odd outlying transfer fee – notice Ronaldo in 2009, Zidane in 2001 or Sheare rin 1996.
  • The plots overlap and are much more condensed. This makes for a smaller plot, and the overlapping plots are a bit more interesting to look at.
  • A colourmap is applied, changing the colour for each year. As we have overlapped the plots, we need to set a colour difference to tell our years apart.
  • We’ve set custom limits to the x axis, stopping the overspill from negative numbers

There are still some changes that we would like to make. We should really add axis titles, annotate some interesting transfers and so on – but this is a great start and illustrates how we can make joyplots quickly and easily.

It would be remiss of any Joyplot tutorial not to plot the Unknown Pleasures cover that inspires the plot name. Borrowing code from the docs, and using the data above, let’s give it a go:

In [4]:
fig, axes = joypy.joyplot(df[df.Player != 'Neymar'],by="Year", column="Value", ylabels=False, xlabels=False, 
                          grid=False, fill=False, background='k', linecolor="w", linewidth=1, x_range=[-60,110],
                          legend=False, overlap=0.5, figsize=(6,5),kind="counts", bins=80)

Interestingly, we can see outliers a lot easier with this style! I quite like the aesthetics, but it is maybe not as eye-catching as a data visualisation compared to the previous plot.

Summary

This article has taken you through the steps of creating and editing your first joyplots. These overlapping density plots can make really beautiful charts that show a lot of information in a novel way. These charts use time to differentiate each row on the y axis, but you may also want to find a way to plot time along the x axis to show changes over time, rather than changes in density.

Next up, take a look at plotting heatmaps for correlation, or our other visualisation articles.

Posted by FCPythonADMIN in Visualisation

Making Better Python Visualisations

FC Python recently received a tweet from @fitbawnumbers applying our lollipop chart code to Pep’s win percentage. It was great to see this application of the chart, and especially interesting because Philip then followed up with another chart showing the same data from Excel. To be blunt, the Excel chart was much cleaner/better than our lollipop charts – Philip had done a great job with it.

This has inspired us to put together a post exploring some of matplotlib’s customisation options and principles that underpin them. Hopefully this will give us a better looking and more engaging chart!

As a reminder, this is the chart that we are dealing with improving, and you can find the tutorial for lollipop charts here.

Step One – Remove everything that adds nothing

There is clearly lots that we can improve on here. Let’s start with the basics – if you can remove something without damaging your message, remove it. We have lots of ugly lines here, let’s remove the box needlessly around our data, along with those ticks. Likewise the axes labels, we know that the y axis shows teams – so let’s bin that too. We’ll do this with the following code:

In [ ]:
#For every side of the box, set to invisible

for side in ['right','left','top','bottom']:
    ax.spines[side].set_visible(False)
    
#Remove the ticks on the x and y axes

for tic in ax.xaxis.get_major_ticks():
    tic.tick1On = tic.tick2On = False

for tic in ax.yaxis.get_major_ticks():
    tic.tick1On = tic.tick2On = False

Step Two – Where appropriate, change the defaults

Philip’s Excel chart looked great because it didn’t look like an Excel chart. He had changed all of the defaults: the colours, the font, the label location. Subsequently, it doesn’t look like the charts that have bored us to death in presentations for decades. So let’s change our title locations and fonts to make it look like we’ve put some effort in beyond the defaults. Code below:

In [ ]:
#Change font
plt.rcParams["font.family"] = "DejaVu Sans"

#Instead of use plt.title, we'll use plt.text to fully customise it
#First two arguments are x/y location
plt.text(55, 19,"Premier League 16/17", size=18, fontweight="normal")

Step Three – Add labels if they are clean and give detail

While the lollipop chart makes it easy to understand the differences between teams, our orignal chart requires users to look all the way down if they want the value. Even then, the audience has to make a rough estimation. Why not add values to make everything a bit cleaner?

We can easily iterate through our values in the dataframe and plot them alongside the charts. The code below uses ‘enumerate()’ to count through each of the values in the points column of our table. For each value, it writes text at location v,i (nudged a bit with the sums below). Take a look at the for loop:

In [ ]:
for i, v in enumerate(table['Pts']):
    ax.text(v+2, i+0.8, str(v), color=teamColours[i], size = 13)

Step Four – Improve aesthetics with strong colour against off-white background

Our lollipop sticks are very, very thin. We can improve the look of these by giving them a decent thickness and a block of bold colour. Underneath this colour, we should add an off-white colour. This differentiates the plot from the rest of the page, and makes it look a lot more professional. Next time you see a great plot, take note of the base colour and try to understand the effect that this has on the plot and article as a whole!

Our code for doing these two things is below:

In [ ]:
#Set a linewidth in our hlines argument
plt.hlines(y=np.arange(1,21),xmin=0,xmax=table['Pts'],color=teamColours,linewidths=10)

#Set a background colour to the data area background and the plot as a whole
ax.set_facecolor('#f7f4f4')
fig.patch.set_facecolor('#f7f4f4')

Fitting it all together

Putting all of these steps together, we get something like the following. Follow along with the comments and see what fits in where:

In [1]:
#Set our plot and desired size
fig = plt.figure(figsize=(10,7))
ax = plt.subplot()

#Change our font
plt.rcParams["font.family"] = "DejaVu Sans"

#Each value is the hex code for the team's colours, in order of our chart
teamColours = ['#034694','#001C58','#5CBFEB','#D00027',
              '#EF0107','#DA020E','#274488','#ED1A3B',
               '#000000','#091453','#60223B','#0053A0',
               '#E03A3E','#1B458F','#000000','#53162f',
               '#FBEE23','#EF6610','#C92520','#BA1F1A']

#Plot our thicker lines and team names
plt.hlines(y=np.arange(1,21),xmin=0,xmax=table['Pts'],color=teamColours,linewidths=10)
plt.yticks(np.arange(1,21), table['Team'])

#Label our axes as needed and title the plot
plt.xlabel("Points")
plt.text(55, 19,"Premier League 16/17", size=18, fontweight="normal")

#Add the background colour
ax.set_facecolor('#f7f4f4')
fig.patch.set_facecolor('#f7f4f4')

for side in ['right','left','top','bottom']:
    ax.spines[side].set_visible(False)

for tic in ax.xaxis.get_major_ticks():
    tic.tick1On = tic.tick2On = False

for tic in ax.yaxis.get_major_ticks():
    tic.tick1On = tic.tick2On = False
    
for i, v in enumerate(table['Pts']):
    ax.text(v+2, i+0.8, str(v), color=teamColours[i], size = 13)

plt.show()

Without doubt, this is a much better looking chart than the lollipop. Not only does it look better, but it gives us more information and communicates better than our former effort. Thank you Philip for the inspiration!

Summary

This article has looked at a few ways to tidy our charts. The rules that we introduced throughout should be applied to any visualisation that you’re looking to communicate with. Ensure that your charts are as clean as possible, are labelled and stray away from defaults. Follow these, and you’ll be well on your way to creating great plots!

Why not apply these rules to some of the other basic examples in our visualisation series and let us know how you improve on our articles!

Posted by FCPythonADMIN in Blog, Visualisation

Python Treemaps with Squarify & Matplotlib

Treemaps are visualisations that split the area of our chart to display the value of our datapoints. At their simplest, they display shapes in sizes appropriate to their value, so bigger rectangles represent higher values. Python allows us to create these charts quite easily, as it will calculate the size of each rectangle for us and plot it in a way that fits. In addition to this, we can combine our treemap with the matplotlib library’s ability to scale colours against variables to make good looking and easy to understand plots with Python.

Let’s fire up our libraries (make sure you install squarify!) and take a look at our data:

In [1]:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import squarify
In [2]:
data = pd.read_csv("Data/ManCityATT.csv")
data.head()
Out[2]:
Player Pos GP GS MP G A SOG S YC RC
0 Agüero, Sergio F 19 17 1518 16 5 32 77 1 0
1 Sterling, Raheem M 22 18 1668 14 4 23 55 2 1
2 Gabriel Jesus F 18 12 1016 8 2 22 35 3 0
3 Sané, Leroy M 22 17 1548 7 10 14 36 4 0
4 De Bruyne, Kevin M 24 24 2060 6 10 28 61 1 0

Our dataframe has a record for each player in Manchester City’s squad, with their games/minutes played alongside goals, assists, shots and card data.

Facing Manchester City next week, we would like to visualise where their threat comes from and who they rely on for goals and assists. We’ll do this with our treemap!

We will create our treemap in a few key steps:

  1. Create a new dataframe that contains only players that have scored.
  2. Utilise matplotlib to create a colour map that assigns each player a colour according to how many goals they have scored.
  3. Set up a new, rectangular plot for our heatmap
  4. Plot our data & title
  5. Show the plot, with no axes

The commented code below will show you exactly how you can do this:

In [3]:
# New dataframe, containing only players with more than 0 goals.
dataGoals = data[data["G"]>0]

#Utilise matplotlib to scale our goal numbers between the min and max, then assign this scale to our values.
norm = matplotlib.colors.Normalize(vmin=min(dataGoals.G), vmax=max(dataGoals.G))
colors = [matplotlib.cm.Blues(norm(value)) for value in dataGoals.G]

#Create our plot and resize it.
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(16, 4.5)

#Use squarify to plot our data, label it and add colours. We add an alpha layer to ensure black labels show through
squarify.plot(label=dataGoals.Player,sizes=dataGoals.G, color = colors, alpha=.6)
plt.title("Man City Goals",fontsize=23,fontweight="bold")

#Remove our axes and display the plot
plt.axis('off')
plt.show()

Let’s take another look, this time creating a treemap for assists. See if you can understand what we are doing without the comments above!

In [4]:
dataAssists = data[data["A"]>0]

norm = matplotlib.colors.Normalize(vmin=min(dataAssists.A), vmax=max(dataAssists.A))
colors = [matplotlib.cm.Blues(norm(value)) for value in dataAssists.A]

fig = matplotlib.pyplot.gcf()
fig.set_size_inches(16, 4.5)

fig = plt.gcf()
fig.set_size_inches(16, 4.5)

squarify.plot(label=dataAssists.Player,sizes=dataAssists.A, color = colors, alpha=.6)
plt.title("Man City Assists",fontsize=23,fontweight="bold")

plt.axis('off')
plt.show()

Summary

Awesome, we now have a couple of simple charts that show the dangerous players in City’s lineups, and the code to reproduce these for other teams. This should make for a quick, easy and impactful addition for your pre-match reports!

Next up, why not learn more about Python visualisations, like violin plots or lollipop charts?

Posted by FCPythonADMIN in Visualisation

Creating Personal Football Heatmaps in Python

Tracking technology has been a part of football analysis for the past 20 years, giving access to data on physical performance and heat map visualisations that show how far and wide a player covers. As this technology becomes cheaper and more accessible, it has now become easy for anyone to get this data on their Sunday morning games. This article runs through how you can create your own heatmaps for a game, with nothing more than a GPS tracking device (running watch, phone, gps unit) and Python.

To get your hands on your own data, you can extract your gpx file through Strava. While Strava is great for runs, it isn’t built for football or running in tight spaces. So let’s build our own!

Let’s import our necessary modules and data, then get started!

In [1]:
#GPXPY makes using .gpx files really easy
import gpxpy

#Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

#Opens our .gpx file, then parses it into a format that is easy for us to run through
gpx_file = open('5aside.gpx', 'r')
gpx = gpxpy.parse(gpx_file)

The .gpx file type, put simply, is a markup file that records the time and your location on each line. With location and time, we can calculate distance between locations and, subsequently, speed. We can also visualise this data, as we’ll show here.

Let’s take a look at what one of these lines looks like:

In [2]:
gpx.tracks[0].segments[0].points[0]
Out[2]:
GPXTrackPoint(51.5505, -0.3048, elevation=44, time=datetime.datetime(2018, 1, 19, 12, 14, 26))

The first two points are our latitude and longitude, alongside elevation and time. This gives us a lot of freedom to calculate variables and plot our data, and is the foundation of a lot of the advanced metrics that you will find on Strava.

In our example, we want to plot our latitude and longitude, so let’s use a for loop to add these to a list:

In [3]:
lat = []
lon = []

for track in gpx.tracks:
    for segment in track.segments:
        for point in segment.points:
            lat.append(point.latitude)
            lon.append(point.longitude)

Our location is now extraceted into a handy x and y format….let’s plot it. We’ve borrowed Andy Kee‘s Strava plotting aesthetic here, take a read of his article for more information on plotting your cycle/run data!

In [4]:
fig = plt.figure(facecolor = '0.1')
ax = plt.Axes(fig, [0., 0., 1., 1.], )
ax.set_aspect('equal')
ax.set_axis_off()
fig.add_axes(ax)
plt.plot(lon, lat, color = 'deepskyblue', lw = 0.3, alpha = 0.9)
plt.show()

The lines are great, and make for a beautiful plot, but let’s try and create a Prozone-esque heatmap on our pitch.

To do this, we can plot on the actual pitch that we played on, using the gmplot module. GM stands for Google Maps, and will import its functionality for our plot. Let’s take a look at how this works:

In [5]:
#Import the module first
import gmplot

#Start an instance of our map, with three arguments: lat/lon centre point of map - in this case,
#We'll use the first location in our data. The last argument is the default zoom level of the map
gmap = gmplot.GoogleMapPlotter(lat[0], lon[0], 20)

#Create our heatmap using our lat/lon lists for x and y coordinates
gmap.heatmap(lat, lon)

#Draw our map and save it to the html file named in the argument
gmap.draw("Player1.html")

This code will spit out a html file, that we can then open to get our heatmap plotted on a Google Maps background. Something like the below:

 Football heatmap created in Python

Summary

Similar visualisations of professional football matches set clubs and leagues back a pretty penny, and you can do this with entirely free software and increasingly affordable kit. While this won’t improve FC Python’s exceedingly poor on-pitch performances, we definitely think it is pretty cool!

Simply export your gpx data from Strava and extract the lat/long data, before plotting it as a line or as a heatmap on a map background for some really engaging visualisation.

Next up, learn about plotting this on a pitchmap, rather than satellite imagery.

Posted by FCPythonADMIN in Blog

Drawing a Pass Map in Python

Pass maps are an established visualisation in football analysis, used to show the area of the pitch where a player made their passes. You’ll find examples across the Football Manager series, TV coverage, and pretty much all formats of football journalism. Similar plots are used to show shots or other events in a game, and multiple other sports make use of similar maps of what goes on during a game. This article runs through one way to create these in Python, making use of the Matplotlib library. Let’s fire up our modules, open our dataset and take a look at what we are working with:

In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Arc

%matplotlib inline

data = pd.read_csv("EventData/passes.csv")

data.head()
Out[20]:
Half Time Event Player Team Xstart Ystart Xend Yend
0 First Half 67.0 Pass Zeedayne France 12 3 118 65
1 First Half 70.2 Pass Zeedayne France 82 30 72 26
2 First Half 78.5 Pass Zeedayne France 1 3 69 73
3 First Half 106.5 Pass Zeedayne France 41 46 117 60
4 First Half 115.7 Pass Zeedayne France 34 24 4 20

*** Plotting Lines

Our dataset contains Zeedayne’s passes from her match. We have when they happened, in additon to the starting and ending X and Y locations. With this information, matplotlib makes it easy to draw lines. We can use the ‘.plot()’ function to draw lines if we give it two lists:

  • List one must contain the start and end X locations
  • List two gives the start and end Y locations

For example, plt.plot([0,1],[2,3] will plot a line from location (0,2) to (1,3).

We could write this line to plot each of Zeedayne’s passes, but we hate repeating ourselves and are a little bit lazy, so let’s use a for loop to do this. Take a look at our code below to see it in action:

In [25]:
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)

for i in range(len(data)):
    plt.plot([int(data["Xstart"][i]),int(data["Xend"][i])],
             [int(data["Ystart"][i]),int(data["Yend"][i])], 
             color="blue")
    
plt.show()

Great job on plotting all of the passes! Unfortunately, we do not know where they happened on the pitch, or the direction, or much else, but we will get there!

Let’s start with adding a circle at the starting point of each pass to understand the direction. This is as easy as before, we just plot the start data, like below:

In [29]:
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)

for i in range(len(data)):
    plt.plot([int(data["Xstart"][i]),int(data["Xend"][i])],
             [int(data["Ystart"][i]),int(data["Yend"][i])], 
             color="blue")
    
    plt.plot(int(data["Xstart"][i]),int(data["Ystart"][i]),"o", color="green")
    
plt.show()

Another massive and easy improvement would be to add a pitch map – as our article here explains. Let’s steal the code and add the pitch here – obviously feel free to steal the pitch too!

In [27]:
#Create figure
fig=plt.figure()
fig.set_size_inches(7, 5)
ax=fig.add_subplot(1,1,1)

#Pitch Outline & Centre Line
plt.plot([0,0],[0,90], color="black")
plt.plot([0,130],[90,90], color="black")
plt.plot([130,130],[90,0], color="black")
plt.plot([130,0],[0,0], color="black")
plt.plot([65,65],[0,90], color="black")

#Left Penalty Area
plt.plot([16.5,16.5],[65,25],color="black")
plt.plot([0,16.5],[65,65],color="black")
plt.plot([16.5,0],[25,25],color="black")

#Right Penalty Area
plt.plot([130,113.5],[65,65],color="black")
plt.plot([113.5,113.5],[65,25],color="black")
plt.plot([113.5,130],[25,25],color="black")

#Left 6-yard Box
plt.plot([0,5.5],[54,54],color="black")
plt.plot([5.5,5.5],[54,36],color="black")
plt.plot([5.5,0.5],[36,36],color="black")

#Right 6-yard Box
plt.plot([130,124.5],[54,54],color="black")
plt.plot([124.5,124.5],[54,36],color="black")
plt.plot([124.5,130],[36,36],color="black")

#Prepare Circles
centreCircle = plt.Circle((65,45),9.15,color="black",fill=False)
centreSpot = plt.Circle((65,45),0.8,color="black")
leftPenSpot = plt.Circle((11,45),0.8,color="black")
rightPenSpot = plt.Circle((119,45),0.8,color="black")

#Draw Circles
ax.add_patch(centreCircle)
ax.add_patch(centreSpot)
ax.add_patch(leftPenSpot)
ax.add_patch(rightPenSpot)

#Prepare Arcs
leftArc = Arc((11,45),height=18.3,width=18.3,angle=0,theta1=310,theta2=50,color="black")
rightArc = Arc((119,45),height=18.3,width=18.3,angle=0,theta1=130,theta2=230,color="black")

#Draw Arcs
ax.add_patch(leftArc)
ax.add_patch(rightArc)

#Tidy Axes
plt.axis('off')

for i in range(len(data)):
    plt.plot([int(data["Xstart"][i]),int(data["Xend"][i])],[int(data["Ystart"][i]),int(data["Yend"][i])], color="blue")
    plt.plot(int(data["Xstart"][i]),int(data["Ystart"][i]),"o", color="green")

#Display Pitch
plt.show()

Awesome, now we can see Zeedayne’s pass locations – seems to cover just about everywhere!

Summary

Plotting simple pass maps is pretty easy – we just need to use matplotlib’s ‘.plot’ functionality to draw our lines, and a for loop to run through X/Y origin and destiniation data to plot each line.

On their own, they do not offer much information, but once we add start location and a pitch map, we start to see where a player played their passes, where they ended up and the range that they employed in the match.

To develop on this, we can look to colour code our lines for success, or another variable. We could even look to plot a heatmap to show where a player was active. Watch out for a further article on these!

Posted by FCPythonADMIN in Visualisation

Drawing a Pitchmap – Adding Lines & Circles in Matplotlib

There are lots of reasons why we might want to draw a line or circle on our charts. We could look to add an average line, highlight a key data point or even draw a picture. This article will show how to add lines, circles and arcs with the example of a football pitch map that could then be used to show heatmaps, passes or anything else that happens during a match.

This example works with FIFA’s offical pitch sizes, but you might want to change them according to your data/sport/needs. Let’s import matplotlib as normal, in addition to its Arc functionality.

In [1]:
import matplotlib.pyplot as plt
from matplotlib.patches import Arc

Drawing Lines

It is easiest for us to start with our lines around the outside of the pitch. Once we create our plot with the first two lines of our code, drawing a line is pretty easy with ‘.plot’. You have probably already seen ‘.plot’ used to display scatter points, but to draw a line, we just need to provide two lists as arguments and matplotlib will do the thinking for us:

  • List one: starting and ending X locations
  • List two: starting and ending Y locations

Take a look at the code and plot below to understand our outlines. Use the colour guides to see how they are plotted with start and end point lists.

In [2]:
fig=plt.figure()
ax=fig.add_subplot(1,1,1)

plt.plot([0,0],[0,90], color="blue")
plt.plot([0,130],[90,90], color="orange")
plt.plot([130,130],[90,0], color="green")
plt.plot([130,0],[0,0], color="red")
plt.plot([65,65],[0,90], color="pink")

plt.show()

Great job! Matplotlib makes drawing lines very easy, it just takes some clear thinking with start and end locations to get them plotted.

Drawing Circles

Next up, we’re going to draw some circles on the pitch. Primarily, we need a centre circle, but we also need markers for the centre and penalty spots.

Adding circles is slightly different to lines. Firstly, we need to assign our circles to a variable. We use ‘.circle’ to do this, passing it two essential arguments:

  • X/Y coordinates of the middle of the circle
  • Radius of the circle

For our circles, we’ll also assign colour and fill, but these are optional.

With these circles assigned then use ‘.patch’ to draw the circle to our plot.

Take a look at our code below:

In [3]:
#Create figure
fig=plt.figure()
ax=fig.add_subplot(1,1,1)

#Pitch Outline & Centre Line
plt.plot([0,0],[0,90], color="black")
plt.plot([0,130],[90,90], color="black")
plt.plot([130,130],[90,0], color="black")
plt.plot([130,0],[0,0], color="black")
plt.plot([65,65],[0,90], color="black")

#Assign circles to variables - do not fill the centre circle!
centreCircle = plt.Circle((65,45),9.15,color="red",fill=False)
centreSpot = plt.Circle((65,45),0.8,color="blue")

#Draw the circles to our plot
ax.add_patch(centreCircle)
ax.add_patch(centreSpot)


plt.show()

Drawing Arcs

Now that you can create circles, arcs will be just as easy – we’ll need them for the lines outside the penalty area. While they take a few more arguments, they follow the same pattern as before. Let’s go through the arguments:

  • X/Y coordinates of the centrepoint of the arc, assuming the arc was a complete shape.
  • Width – we must pass width and height as the arc might not be a circle, it might instead be from an oval shape
  • Height – as above
  • Angle – degree rotation of the shape (anti-clockwise)
  • Theta1 – start location of the arc, in degrees
  • Theta2 – end location of the arc, in degrees

That’s a few more arguments than for the circle and lines, but don’t let that make you think that this is too much more complicated. Our code will look like this for one arc:

leftArc = Arc((11,45),height=18.3,width=18.3,angle=0,theta1=310,theta2=50)

All that we need to do after this is draw the arc to our plot, just like with the circles:

ax.add_patch(leftArc)

You can see this in action below:

In [4]:
#Demo Arcs
 
#Create figure
fig=plt.figure()
ax=fig.add_subplot(1,1,1)

#Pitch Outline & Centre Line
plt.plot([0,0],[0,90], color="black")
plt.plot([0,130],[90,90], color="black")
plt.plot([130,130],[90,0], color="black")
plt.plot([130,0],[0,0], color="black")
plt.plot([65,65],[0,90], color="black")

#Left Penalty Area
plt.plot([16.5,16.5],[65,25],color="black")
plt.plot([0,16.5],[65,65],color="black")
plt.plot([16.5,0],[25,25],color="black")

#Centre Circle/Spot
centreCircle = plt.Circle((65,45),9.15,fill=False)
centreSpot = plt.Circle((65,45),0.8)
ax.add_patch(centreCircle)
ax.add_patch(centreSpot)

#Create Arc and add it to our plot
leftArc = Arc((11,45),height=18.3,width=18.3,angle=0,theta1=310,theta2=50,color="red")

ax.add_patch(leftArc)

plt.show()

Bringing everything together

The code below applies the above lines, cricles and arcs to a function for quick and easy use. The only new line removes our axes:

plt.axis(‘off’)

Take a look through our function belong and follow what we are doing. Feel free to take this and use it as the base for your own plots!

In [5]:
def createPitch():
    
    #Create figure
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)

    #Pitch Outline & Centre Line
    plt.plot([0,0],[0,90], color="black")
    plt.plot([0,130],[90,90], color="black")
    plt.plot([130,130],[90,0], color="black")
    plt.plot([130,0],[0,0], color="black")
    plt.plot([65,65],[0,90], color="black")
    
    #Left Penalty Area
    plt.plot([16.5,16.5],[65,25],color="black")
    plt.plot([0,16.5],[65,65],color="black")
    plt.plot([16.5,0],[25,25],color="black")
    
    #Right Penalty Area
    plt.plot([130,113.5],[65,65],color="black")
    plt.plot([113.5,113.5],[65,25],color="black")
    plt.plot([113.5,130],[25,25],color="black")
    
    #Left 6-yard Box
    plt.plot([0,5.5],[54,54],color="black")
    plt.plot([5.5,5.5],[54,36],color="black")
    plt.plot([5.5,0.5],[36,36],color="black")
    
    #Right 6-yard Box
    plt.plot([130,124.5],[54,54],color="black")
    plt.plot([124.5,124.5],[54,36],color="black")
    plt.plot([124.5,130],[36,36],color="black")
    
    #Prepare Circles
    centreCircle = plt.Circle((65,45),9.15,color="black",fill=False)
    centreSpot = plt.Circle((65,45),0.8,color="black")
    leftPenSpot = plt.Circle((11,45),0.8,color="black")
    rightPenSpot = plt.Circle((119,45),0.8,color="black")
    
    #Draw Circles
    ax.add_patch(centreCircle)
    ax.add_patch(centreSpot)
    ax.add_patch(leftPenSpot)
    ax.add_patch(rightPenSpot)
    
    #Prepare Arcs
    leftArc = Arc((11,45),height=18.3,width=18.3,angle=0,theta1=310,theta2=50,color="black")
    rightArc = Arc((119,45),height=18.3,width=18.3,angle=0,theta1=130,theta2=230,color="black")

    #Draw Arcs
    ax.add_patch(leftArc)
    ax.add_patch(rightArc)
    
    #Tidy Axes
    plt.axis('off')
    
    #Display Pitch
    plt.show()
    
createPitch()

Summary

In our article, we’ve seen how to draw lines, arcs and circles in Matplotlib. You’ll find this useful when trying to add the finishing touches with annotations to any plot. These tools are equally important when drawing a map on which we will plot our data – like our pitchmap example here.

Take a look at our other visualisation articles here and be sure to get in touch with us on Twitter!

Posted by FCPythonADMIN in Visualisation

Radar Charts in Matplotlib

In football analysis and video games, radar charts have been popularised in a number of places, from the FIFA series, to Ted Knutson’s innovative ways of displaying player data.

Radar charts are an engaging way to show data that typically piques more attention than a bar chart although you can often use both of these to show the same data.

This article runs through the creation of basic radar charts in Matplotlib, plotting the FIFA Ultimate Team data of a couple of players, before creating a function to streamline the process. To start, let’s get our libraries and data pulled together.

In [1]:
import pandas as pd
from math import pi
import matplotlib.pyplot as plt
%matplotlib inline

#Create a data frame from Messi and Ronaldo's 6 Ultimate Team data points from FIFA 18
Messi = {'Pace':89,'Shooting':90,'Passing':86,'Dribbling':95,'Defending':26,'Physical':61}
Ronaldo = {'Pace':90,'Shooting':93,'Passing':82,'Dribbling':90,'Defending':33,'Physical':80}

data = pd.DataFrame([Messi,Ronaldo], index = ["Messi","Ronaldo"])
data
Out[1]:
Defending Dribbling Pace Passing Physical Shooting
Messi 26 95 89 86 61 90
Ronaldo 33 90 90 82 80 93

Plotting data in a radar has lots of similarities to plotting along a straight line (like a bar chart). We still need to provide data on where our line goes, we need to label our axes and so on. However, as it is a circle, we will also need to provide the angle at which the lines run. This is much easier than it sounds with Python.

Firstly, let’s do the easy bits and take a list of Attributes for our labels, along with a basic count of how many there are.

In [2]:
Attributes =list(data)
AttNo = len(Attributes)

We then take a list of the values that we want to plot, then copy the first value to the end. When we plot the data, this will be the line that the radat follows – take a look below:

In [3]:
values = data.iloc[1].tolist()
values += values [:1]
values
Out[3]:
[33, 90, 90, 82, 80, 93, 33]

So these are the point that we will draw on our radar, but we will need to find the angles between each point for our line to follow. The formula below finds these angles and assigns them to ‘angles’. Then, just as above, we copy the first value to the end of our array to complete the line.

In [4]:
angles = [n / float(AttNo) * 2 * pi for n in range(AttNo)]
angles += angles [:1]

Now that we have our values to plot, and the angles between them, drawing the radar is pretty simple.

Follow along with the comments below, but note the ‘polar=true’ in our subplot – this changes our chart from a more-traditional x and y axes chart, to a the circular radar chart that we are looking for.

In [5]:
ax = plt.subplot(111, polar=True)

#Add the attribute labels to our axes
plt.xticks(angles[:-1],Attributes)

#Plot the line around the outside of the filled area, using the angles and values calculated before
ax.plot(angles,values)

#Fill in the area plotted in the last line
ax.fill(angles, values, 'teal', alpha=0.1)

#Give the plot a title and show it
ax.set_title("Ronaldo")
plt.show()

Comparing two sets of data in a radar chart

One additional benefit of the radar chart is the ability to compare two observations (or players, in this case), quite easily.

The example below repeats the above process for finding angles for Messi’s data points, then plots them both together.

In [6]:
#Find the values and angles for Messi - from the table at the top of the page
values2 = data.iloc[0].tolist()
values2 += values2 [:1]

angles2 = [n / float(AttNo) * 2 * pi for n in range(AttNo)]
angles2 += angles2 [:1]


#Create the chart as before, but with both Ronaldo's and Messi's angles/values
ax = plt.subplot(111, polar=True)

plt.xticks(angles[:-1],Attributes)

ax.plot(angles,values)
ax.fill(angles, values, 'teal', alpha=0.1)

ax.plot(angles2,values2)
ax.fill(angles2, values2, 'red', alpha=0.1)

#Rather than use a title, individual text points are added
plt.figtext(0.2,0.9,"Messi",color="red")
plt.figtext(0.2,0.85,"v")
plt.figtext(0.2,0.8,"Ronaldo",color="teal")
plt.show()

Creating a function to plot individual players

This is a lot of code if we want to create multiple charts. We can easily turn these charts into a function, which will do all the heavy lifting for us – all we will have to do is provide it with a player name and data that we want to plot:

In [7]:
def createRadar(player, data):
    Attributes = ["Defending","Dribbling","Pace","Passing","Physical","Shooting"]
    
    data += data [:1]
    
    angles = [n / 6 * 2 * pi for n in range(6)]
    angles += angles [:1]
    
    ax = plt.subplot(111, polar=True)

    plt.xticks(angles[:-1],Attributes)
    ax.plot(angles,data)
    ax.fill(angles, data, 'blue', alpha=0.1)

    ax.set_title(player)
    plt.show()
In [8]:
createRadar("Dybala",[24,91,86,81,67,85])

And how about we do the same thing to compare two players?

In [9]:
def createRadar2(player, data, player2, data2):
    Attributes = ["Defending","Dribbling","Pace","Passing","Physical","Shooting"]
    
    data += data [:1]
    data2 += data2 [:1]
    
    angles = [n / 6 * 2 * pi for n in range(6)]
    angles += angles [:1]
    
    angles2 = [n / 6 * 2 * pi for n in range(6)]
    angles2 += angles2 [:1]
    
    ax = plt.subplot(111, polar=True)

    #Create the chart as before, but with both Ronaldo's and Messi's angles/values
    ax = plt.subplot(111, polar=True)

    plt.xticks(angles[:-1],Attributes)

    ax.plot(angles,values)
    ax.fill(angles, values, 'teal', alpha=0.1)

    ax.plot(angles2,values2)
    ax.fill(angles2, values2, 'red', alpha=0.1)

    #Rather than use a title, individual text points are added
    plt.figtext(0.2,0.9,player,color="teal")
    plt.figtext(0.2,0.85,"v")
    plt.figtext(0.2,0.8,player2,color="red")
    plt.show()
In [10]:
createRadar2("Henderson", [76,76,62,82,81,70],"Wilshere", [62,82,71,80,72,69])

Summary

Radar charts are an interesting way to display data and allow us to compare two observations quite nicely. In this article, we have used them to compare fictional FIFA players, but analysts have used this format very innovatively to display actual performance data in an engaging format.

Take a look at Statsbomb‘s use of radar charts with real data, or learn more about visualisation in Python here.

Posted by FCPythonADMIN in Visualisation

Creating Pie Charts in Matplotlib

I tend to think that pie charts should be avoided in 99% of the cases that they are used in. Unless your goal is to mislead (which is sometimes the case!), or you have a strict use case for them, you can normally find a better way to communicate your point.

That being said, just because we won’t do something, doesn’t mean we don’t need to know how it is done. As such, this article is going to take us through a simple example of creating a pie chart in Matplotlib.

As ever, let’s get our modules and data ready to go.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

Our pie chart is going to display the share of Premier League wins, as shown in our data below:

In [2]:
leagueWins = {'Team':['Manchester United','Blackburn Rovers','Arsenal',
                     'Chelsea','Manchester City','Leicester City'],
             'Championships':[13,1,3,4,2,1]}

df = pd.DataFrame(leagueWins, columns=['Team','Championships'])
df
Out[2]:
Team Championships
0 Manchester United 13
1 Blackburn Rovers 1
2 Arsenal 3
3 Chelsea 4
4 Manchester City 2
5 Leicester City 1

So we want the pie chart to plot the numbers in our Championships column. ‘plt.pie()’ will do exactly that:

In [3]:
plt.pie(df['Championships'])

#This next line just makes the plot look a little cleaner in this notebook
plt.tight_layout()

So we have a pie chart! It doesn’t tell us a great deal without labels, except that there is a big blue lump that takes up over half of the pie.

As with all of its other plot types, Matplotlib gives good customisation options. Let’s use some of these to add a title, labels and colours in our arguments:

In [4]:
#Create a list of the colours used for the teams, in order.
teamColours=['#f40206','#0560b5','#ce0000','#1125ff','#28cdff','#091ebc']

plt.pie(df['Championships'],
        #Data labels are the team names in the dataFrame
       labels = df['Team'],
        #Assign our colours list
       colors = teamColours,
        #Give a tidier angle to ur first data angle
        startangle = 90
       )

#Add a title
plt.title("Premier League Titles")
plt.tight_layout()

Summary

I strongly recommend not using pie charts, we just struggle to process circular space in comparison to bar charts or even a table – especially when the numbers are relatively simple.

However, just in case it is ever needed, we have seen in this article how easy it is to create a pie chart in Matplotlib with the ‘.pie()’ command. It is also clear that we need to make use of Matplotlib’s customisation features to tidy things up, add a bit of relevant colour and titles. Passing these as arguments into the earlier command makes this easy.

Next up, read up on some different (better!) visualisation types!

Posted by FCPythonADMIN in Visualisation