Python Treemaps with Squarify & Matplotlib

Treemaps are visualisations that split the area of our chart to display the value of our datapoints. At their simplest, they display shapes in sizes appropriate to their value, so bigger rectangles represent higher values. Python allows us to create these charts quite easily, as it will calculate the size of each rectangle for us and plot it in a way that fits. In addition to this, we can combine our treemap with the matplotlib library’s ability to scale colours against variables to make good looking and easy to understand plots with Python.

Let’s fire up our libraries (make sure you install squarify!) and take a look at our data:

In [1]:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import squarify
In [2]:
data = pd.read_csv("Data/ManCityATT.csv")
data.head()
Out[2]:
Player Pos GP GS MP G A SOG S YC RC
0 Agüero, Sergio F 19 17 1518 16 5 32 77 1 0
1 Sterling, Raheem M 22 18 1668 14 4 23 55 2 1
2 Gabriel Jesus F 18 12 1016 8 2 22 35 3 0
3 Sané, Leroy M 22 17 1548 7 10 14 36 4 0
4 De Bruyne, Kevin M 24 24 2060 6 10 28 61 1 0

Our dataframe has a record for each player in Manchester City’s squad, with their games/minutes played alongside goals, assists, shots and card data.

Facing Manchester City next week, we would like to visualise where their threat comes from and who they rely on for goals and assists. We’ll do this with our treemap!

We will create our treemap in a few key steps:

  1. Create a new dataframe that contains only players that have scored.
  2. Utilise matplotlib to create a colour map that assigns each player a colour according to how many goals they have scored.
  3. Set up a new, rectangular plot for our heatmap
  4. Plot our data & title
  5. Show the plot, with no axes

The commented code below will show you exactly how you can do this:

In [3]:
# New dataframe, containing only players with more than 0 goals.
dataGoals = data[data["G"]>0]

#Utilise matplotlib to scale our goal numbers between the min and max, then assign this scale to our values.
norm = matplotlib.colors.Normalize(vmin=min(dataGoals.G), vmax=max(dataGoals.G))
colors = [matplotlib.cm.Blues(norm(value)) for value in dataGoals.G]

#Create our plot and resize it.
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(16, 4.5)

#Use squarify to plot our data, label it and add colours. We add an alpha layer to ensure black labels show through
squarify.plot(label=dataGoals.Player,sizes=dataGoals.G, color = colors, alpha=.6)
plt.title("Man City Goals",fontsize=23,fontweight="bold")

#Remove our axes and display the plot
plt.axis('off')
plt.show()

Let’s take another look, this time creating a treemap for assists. See if you can understand what we are doing without the comments above!

In [4]:
dataAssists = data[data["A"]>0]

norm = matplotlib.colors.Normalize(vmin=min(dataAssists.A), vmax=max(dataAssists.A))
colors = [matplotlib.cm.Blues(norm(value)) for value in dataAssists.A]

fig = matplotlib.pyplot.gcf()
fig.set_size_inches(16, 4.5)

fig = plt.gcf()
fig.set_size_inches(16, 4.5)

squarify.plot(label=dataAssists.Player,sizes=dataAssists.A, color = colors, alpha=.6)
plt.title("Man City Assists",fontsize=23,fontweight="bold")

plt.axis('off')
plt.show()

Summary

Awesome, we now have a couple of simple charts that show the dangerous players in City’s lineups, and the code to reproduce these for other teams. This should make for a quick, easy and impactful addition for your pre-match reports!

Next up, why not learn more about Python visualisations, like violin plots or lollipop charts?