heatmap

Creating Personal Football Heatmaps in Python

Tracking technology has been a part of football analysis for the past 20 years, giving access to data on physical performance and heat map visualisations that show how far and wide a player covers. As this technology becomes cheaper and more accessible, it has now become easy for anyone to get this data on their Sunday morning games. This article runs through how you can create your own heatmaps for a game, with nothing more than a GPS tracking device (running watch, phone, gps unit) and Python.

To get your hands on your own data, you can extract your gpx file through Strava. While Strava is great for runs, it isn’t built for football or running in tight spaces. So let’s build our own!

Let’s import our necessary modules and data, then get started!

In [1]:
#GPXPY makes using .gpx files really easy
import gpxpy

#Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

#Opens our .gpx file, then parses it into a format that is easy for us to run through
gpx_file = open('5aside.gpx', 'r')
gpx = gpxpy.parse(gpx_file)

The .gpx file type, put simply, is a markup file that records the time and your location on each line. With location and time, we can calculate distance between locations and, subsequently, speed. We can also visualise this data, as we’ll show here.

Let’s take a look at what one of these lines looks like:

In [2]:
gpx.tracks[0].segments[0].points[0]
Out[2]:
GPXTrackPoint(51.5505, -0.3048, elevation=44, time=datetime.datetime(2018, 1, 19, 12, 14, 26))

The first two points are our latitude and longitude, alongside elevation and time. This gives us a lot of freedom to calculate variables and plot our data, and is the foundation of a lot of the advanced metrics that you will find on Strava.

In our example, we want to plot our latitude and longitude, so let’s use a for loop to add these to a list:

In [3]:
lat = []
lon = []

for track in gpx.tracks:
    for segment in track.segments:
        for point in segment.points:
            lat.append(point.latitude)
            lon.append(point.longitude)

Our location is now extraceted into a handy x and y format….let’s plot it. We’ve borrowed Andy Kee‘s Strava plotting aesthetic here, take a read of his article for more information on plotting your cycle/run data!

In [4]:
fig = plt.figure(facecolor = '0.1')
ax = plt.Axes(fig, [0., 0., 1., 1.], )
ax.set_aspect('equal')
ax.set_axis_off()
fig.add_axes(ax)
plt.plot(lon, lat, color = 'deepskyblue', lw = 0.3, alpha = 0.9)
plt.show()

The lines are great, and make for a beautiful plot, but let’s try and create a Prozone-esque heatmap on our pitch.

To do this, we can plot on the actual pitch that we played on, using the gmplot module. GM stands for Google Maps, and will import its functionality for our plot. Let’s take a look at how this works:

In [5]:
#Import the module first
import gmplot

#Start an instance of our map, with three arguments: lat/lon centre point of map - in this case,
#We'll use the first location in our data. The last argument is the default zoom level of the map
gmap = gmplot.GoogleMapPlotter(lat[0], lon[0], 20)

#Create our heatmap using our lat/lon lists for x and y coordinates
gmap.heatmap(lat, lon)

#Draw our map and save it to the html file named in the argument
gmap.draw("Player1.html")

This code will spit out a html file, that we can then open to get our heatmap plotted on a Google Maps background. Something like the below:

 Football heatmap created in Python

Summary

Similar visualisations of professional football matches set clubs and leagues back a pretty penny, and you can do this with entirely free software and increasingly affordable kit. While this won’t improve FC Python’s exceedingly poor on-pitch performances, we definitely think it is pretty cool!

Simply export your gpx data from Strava and extract the lat/long data, before plotting it as a line or as a heatmap on a map background for some really engaging visualisation.

Next up, learn about plotting this on a pitchmap, rather than satellite imagery.

Posted by FCPythonADMIN in Blog

Football Heatmaps with Seaborn

Football heatmaps are used by in-club and media analysts to illustrate the area within which a player has been present. They might illustrate player location, or the events of a player or team and are effectively a smoothed out scatter plot of these points. While there may be some debate as to how much they are useful (they don’t tell you if actions/movement are a good or bad thing!), they can often be very aesthetically pleasing and engaging, hence their popularity. This article will take you through loading your dataset and plotting a heatmap around x & y coordinates in Python.

Let’s get our modules imported and our data ready to go!

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Arc
import seaborn as sns

%matplotlib inline

data = pd.read_csv("Data/passes.csv")

data.head()
Out[1]:
Half Time Event Player Team Xstart Ystart Xend Yend
0 First Half 1 Pass Wombech USA 26 38 66 52
1 First Half 6 Pass Wombech USA 81 34 62 68
2 First Half 6 Pass Wombech USA 46 45 84 63
3 First Half 8 Pass Wombech USA 89 66 89 39
4 First Half 9 Pass Wombech USA 68 64 21 25

Plotting a heatmap

Today’s dataset showcases Wombech’s passes from her match. As you can see, we have time, player and location data. We will be looking to plot the starting X/Y coordinates of Wombech’s passes, but you would be able to do the same with any coordinates that you have – whether GPS/optical tracking coordinates, or other event data.

Python’s Seaborn module makes plotting a tidy dataset incredibly easy with ‘.kdeplot()’. Plotting it with simply the x and y coordinate columns as arguments will give you something like this:

In [2]:
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)


sns.kdeplot(data["Xstart"],data["Ystart"])
plt.show()

Cool, we have a contour plot, which groups lines closer to eachother where we have more density in our data.

Let’s customise this with a couple of additional arguments:

  • shade: fills the gaps between the lines to give us more of the heatmap effect that we are looking for.
  • n_levels: draws more lines – adding lots of these will blur the lines into a heatmap

Take a look at the examples below to see the differences these two arguments produce:

In [3]:
fig, ax = plt.subplots()
fig.set_size_inches(14,4)

#Plot one - include shade
plt.subplot(121)
sns.kdeplot(data["Xstart"],data["Ystart"], shade="True")

#Plot two - no shade, lines only
plt.subplot(122)
sns.kdeplot(data["Xstart"],data["Ystart"])

plt.show()
In [4]:
fig, ax = plt.subplots()
fig.set_size_inches(14,4)

#Plot One - distinct areas with few lines
plt.subplot(121)
sns.kdeplot(data["Xstart"],data["Ystart"], shade="True", n_levels=5)

#Plot Two - fade lines with more of them
plt.subplot(122)
sns.kdeplot(data["Xstart"],data["Ystart"], shade="True", n_levels=40)

plt.show()

Now that we can customise our plot as we see fit, we just need to add our pitch map. Learn more about plotting pitches here, but feel free to use this pitch map below – although you may need to change the coordinates to fit your data!

Also take note of our xlim and ylim lines – we use these to set the size of the plot, so that the heatmap does not spill over the pitch.

In [5]:
#Create figure
fig=plt.figure()
fig.set_size_inches(7, 5)
ax=fig.add_subplot(1,1,1)

#Pitch Outline & Centre Line
plt.plot([0,0],[0,90], color="black")
plt.plot([0,130],[90,90], color="black")
plt.plot([130,130],[90,0], color="black")
plt.plot([130,0],[0,0], color="black")
plt.plot([65,65],[0,90], color="black")

#Left Penalty Area
plt.plot([16.5,16.5],[65,25],color="black")
plt.plot([0,16.5],[65,65],color="black")
plt.plot([16.5,0],[25,25],color="black")

#Right Penalty Area
plt.plot([130,113.5],[65,65],color="black")
plt.plot([113.5,113.5],[65,25],color="black")
plt.plot([113.5,130],[25,25],color="black")

#Left 6-yard Box
plt.plot([0,5.5],[54,54],color="black")
plt.plot([5.5,5.5],[54,36],color="black")
plt.plot([5.5,0.5],[36,36],color="black")

#Right 6-yard Box
plt.plot([130,124.5],[54,54],color="black")
plt.plot([124.5,124.5],[54,36],color="black")
plt.plot([124.5,130],[36,36],color="black")

#Prepare Circles
centreCircle = plt.Circle((65,45),9.15,color="black",fill=False)
centreSpot = plt.Circle((65,45),0.8,color="black")
leftPenSpot = plt.Circle((11,45),0.8,color="black")
rightPenSpot = plt.Circle((119,45),0.8,color="black")

#Draw Circles
ax.add_patch(centreCircle)
ax.add_patch(centreSpot)
ax.add_patch(leftPenSpot)
ax.add_patch(rightPenSpot)

#Prepare Arcs
leftArc = Arc((11,45),height=18.3,width=18.3,angle=0,theta1=310,theta2=50,color="black")
rightArc = Arc((119,45),height=18.3,width=18.3,angle=0,theta1=130,theta2=230,color="black")

#Draw Arcs
ax.add_patch(leftArc)
ax.add_patch(rightArc)

#Tidy Axes
plt.axis('off')

sns.kdeplot(data["Xstart"],data["Ystart"], shade=True,n_levels=50)
plt.ylim(0, 90)
plt.xlim(0, 130)


#Display Pitch
plt.show()

Great work, now we can see Wombech’s pass locations as a heatmap!

Summary

Seaborn makes heatmaps a breeze – we simply use the contour plots with ‘kdeplot()’ and blur our lines to give a heatmap effect.

If using these to communicate rather than analyse, always take care. There is nothing telling you if the actions in the plot are good or bad, but we may make these inferences when discussing them. As always, be sure that what you think is being communicated is actually being communicated!

As for next steps, why not take a look at pass maps, or other parts of our visualisation series?

Posted by FCPythonADMIN in Visualisation

Looking for Correlations with Heatmaps in Seaborn

Note: Apologies for the table formatting in this article. They’ll be fixed soon, but for now, hopefully the code and visualisations will explain what we are learning here!

Looking for things that cause other things is one of the most common investigations into data. While correlation (a relationship between variables) does not equal cause, it will often point you in the right direction and help to aid your understanding of the relationships in your data set.

You can calculate the correlation for every variable against every other variable, but this is a lengthy and inefficient process with large amounts of data. In these cases, seaborn gives us a function to visualise correlations. We can then focus our investigations onto what is interesting from this.

Let’s get our modules imported, a dataset of player attributes ready to go and we can take a look at what the correlations.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

data = pd.read_csv("../../Data/FIFAPlayers.csv")

data.head(2)
Out[1]:
player_api_id overall_rating potential preferred_foot attacking_work_rate defensive_work_rate crossing finishing heading_accuracy short_passing gk_diving gk_handling gk_kicking gk_positioning gk_reflexes player_name birthday p_id height weight
0 307224 64 68 right medium low 44 63 73 49 12 12 7 11 12 Kevin Koubemba 23/03/1993 00:00 307224 193.04 198
1 512726 63 72 right medium medium 51 66 55 57 11 12 12 12 7 Yanis Mbombo Lokwa 08/04/1994 00:00 512726 177.80 172

2 rows × 44 columns

Our data has lots of columns that are not attribute ratings, so let’s .drop() these from our dataset.

In [2]:
data = data.drop(["player_api_id","preferred_foot","attacking_work_rate","defensive_work_rate","player_name","birthday",
                 "p_id","height","weight"],axis=1)

data.head(2)
Out[2]:
overall_rating potential crossing finishing heading_accuracy short_passing volleys dribbling curve free_kick_accuracy vision penalties marking standing_tackle sliding_tackle gk_diving gk_handling gk_kicking gk_positioning gk_reflexes
0 64 68 44 63 73 49 52 52 42 31 55 65 22 22 25 12 12 7 11 12
1 63 72 51 66 55 57 60 64 50 39 48 59 15 16 12 11 12 12 12 7

2 rows × 35 columns

Now we have 35 columns, and a row for each player.

As mentioned, we want to see the correlation between the variables. Knowing these correlations might help us to uncover relationships that help us to better understand our data in the real world.

DataFrames can calculate the correlations really easy using the ‘.corr()’ method. Let’s see what that gives us:

In [3]:
data.corr()
Out[3]:
overall_rating potential crossing finishing heading_accuracy short_passing volleys dribbling curve free_kick_accuracy vision penalties marking standing_tackle sliding_tackle gk_diving gk_handling gk_kicking gk_positioning gk_reflexes
overall_rating 1.000000 0.812783 0.289583 0.260644 0.241417 0.409090 0.298047 0.282968 0.322068 0.286211 0.415812 0.272004 0.122715 0.148644 0.126426 0.023669 0.025814 0.022068 0.024864 0.025722
potential 0.812783 1.000000 0.240445 0.247969 0.172263 0.374329 0.245980 0.318962 0.267889 0.204292 0.355629 0.217775 0.066954 0.094283 0.080880 -0.021628 -0.021715 -0.024431 -0.024717 -0.020982
crossing 0.289583 0.240445 1.000000 0.612054 0.405483 0.806751 0.657961 0.836678 0.824757 0.737226 0.683588 0.626716 0.281258 0.330608 0.315528 -0.664204 -0.660893 -0.657448 -0.664633 -0.666669
finishing 0.260644 0.247969 0.612054 1.000000 0.392763 0.624952 0.876722 0.797271 0.722722 0.666326 0.695608 0.812296 -0.239855 -0.177554 -0.218832 -0.535227 -0.528907 -0.529505 -0.535533 -0.535879
heading_accuracy 0.241417 0.172263 0.405483 0.392763 1.000000 0.581259 0.405780 0.459740 0.359974 0.334387 0.235034 0.475056 0.490770 0.515285 0.478135 -0.737197 -0.734731 -0.729859 -0.734677 -0.736483
short_passing 0.409090 0.374329 0.806751 0.624952 0.581259 1.000000 0.660882 0.825260 0.769699 0.723058 0.740271 0.664610 0.387261 0.451524 0.410343 -0.744615 -0.740532 -0.735705 -0.742899 -0.742561
volleys 0.298047 0.245980 0.657961 0.876722 0.405780 0.660882 1.000000 0.790252 0.772782 0.716469 0.711074 0.799573 -0.144672 -0.078617 -0.115459 -0.543064 -0.539224 -0.537686 -0.544100 -0.543293
dribbling 0.282968 0.318962 0.836678 0.797271 0.459740 0.825260 0.790252 1.000000 0.831146 0.731280 0.736945 0.738639 0.095979 0.159148 0.130645 -0.729865 -0.725150 -0.722204 -0.728617 -0.729229
curve 0.322068 0.267889 0.824757 0.722722 0.359974 0.769699 0.772782 0.831146 1.000000 0.838848 0.742520 0.728738 0.092416 0.154444 0.124954 -0.601724 -0.597503 -0.595797 -0.603641 -0.603702
free_kick_accuracy 0.286211 0.204292 0.737226 0.666326 0.334387 0.723058 0.716469 0.731280 0.838848 1.000000 0.715175 0.723361 0.110824 0.173130 0.136704 -0.552613 -0.548308 -0.545713 -0.552341 -0.552423
long_passing 0.390576 0.323141 0.744002 0.436469 0.459398 0.883278 0.504547 0.677695 0.688117 0.677657 0.679710 0.517182 0.492372 0.547532 0.513989 -0.616808 -0.612804 -0.607938 -0.613013 -0.614926
ball_control 0.369979 0.364856 0.829437 0.751184 0.592930 0.908914 0.761881 0.929211 0.820620 0.741906 0.740002 0.747634 0.244372 0.307811 0.269671 -0.796325 -0.791156 -0.786738 -0.793834 -0.795043
acceleration 0.172540 0.294898 0.604123 0.533289 0.175644 0.486635 0.500103 0.711492 0.550935 0.419734 0.433572 0.429630 -0.027721 0.002464 0.010518 -0.474127 -0.470427 -0.466134 -0.474372 -0.472675
sprint_speed 0.184337 0.302025 0.580925 0.506892 0.252968 0.472873 0.474557 0.678581 0.510566 0.377297 0.379296 0.410285 0.016285 0.045523 0.051699 -0.495136 -0.490773 -0.487010 -0.496836 -0.495392
agility 0.213297 0.265458 0.638343 0.581714 0.097808 0.541594 0.575744 0.730591 0.639356 0.529638 0.578416 0.491535 -0.084819 -0.046347 -0.046212 -0.416865 -0.413718 -0.412907 -0.417915 -0.418175
reactions 0.812882 0.610956 0.302747 0.289203 0.207219 0.390003 0.333682 0.287788 0.339246 0.309461 0.443083 0.292896 0.083667 0.116702 0.094353 0.004749 0.007232 0.003514 0.008106 0.006593
balance 0.090975 0.157920 0.600476 0.454938 0.036127 0.502635 0.467108 0.636712 0.572920 0.487575 0.508085 0.414281 0.021879 0.053601 0.062066 -0.420497 -0.416812 -0.411571 -0.419347 -0.420619
shot_power 0.340539 0.271335 0.693072 0.761025 0.587076 0.757514 0.779821 0.780808 0.748280 0.728748 0.650626 0.762647 0.153358 0.215236 0.173882 -0.676710 -0.674647 -0.670248 -0.677315 -0.678294
jumping 0.233181 0.151305 0.042990 0.009348 0.278075 0.079908 0.021161 0.044958 0.000967 -0.034224 -0.019001 0.028358 0.194274 0.184933 0.199548 -0.076742 -0.075963 -0.074961 -0.072947 -0.072454
stamina 0.254705 0.224281 0.639279 0.429187 0.549530 0.673812 0.450170 0.636785 0.542305 0.479625 0.456012 0.447388 0.455449 0.498886 0.476347 -0.668173 -0.667285 -0.660257 -0.666142 -0.669817
strength 0.216657 0.053890 -0.141812 -0.093178 0.464772 0.023449 -0.077781 -0.161178 -0.160488 -0.111278 -0.154167 -0.020508 0.318745 0.321753 0.289880 -0.078627 -0.077770 -0.080239 -0.079420 -0.078672
long_shots 0.322930 0.263214 0.733510 0.835431 0.426175 0.754340 0.844590 0.824242 0.822106 0.802593 0.753813 0.785956 0.016019 0.085704 0.044695 -0.598275 -0.593260 -0.591472 -0.598032 -0.597834
aggression 0.267311 0.136795 0.398475 0.138608 0.665951 0.533263 0.205226 0.326667 0.291623 0.300151 0.212868 0.262368 0.693004 0.720424 0.694169 -0.569414 -0.569710 -0.562646 -0.565521 -0.567155
interceptions 0.203556 0.113544 0.336912 -0.166087 0.485038 0.461231 -0.061924 0.157309 0.172199 0.195544 0.107490 0.017175 0.920332 0.932888 0.918318 -0.449759 -0.452921 -0.442787 -0.446243 -0.447873
positioning 0.272853 0.252169 0.745236 0.880549 0.432832 0.726891 0.846754 0.870291 0.789513 0.705545 0.749408 0.787595 -0.073362 -0.004290 -0.039903 -0.623406 -0.618075 -0.615414 -0.622558 -0.623916
vision 0.415812 0.355629 0.683588 0.695608 0.235034 0.740271 0.711074 0.736945 0.742520 0.715175 1.000000 0.660357 -0.008086 0.062337 0.022254 -0.414011 -0.409150 -0.405800 -0.413931 -0.413986
penalties 0.272004 0.217775 0.626716 0.812296 0.475056 0.664610 0.799573 0.738639 0.728738 0.723361 0.660357 1.000000 -0.049964 0.009501 -0.033576 -0.585485 -0.579903 -0.578311 -0.586871 -0.586875
marking 0.122715 0.066954 0.281258 -0.239855 0.490770 0.387261 -0.144672 0.095979 0.092416 0.110824 -0.008086 -0.049964 1.000000 0.962230 0.964033 -0.445132 -0.448957 -0.440554 -0.442125 -0.443661
standing_tackle 0.148644 0.094283 0.330608 -0.177554 0.515285 0.451524 -0.078617 0.159148 0.154444 0.173130 0.062337 0.009501 0.962230 1.000000 0.972040 -0.489806 -0.491665 -0.484632 -0.486144 -0.488189
sliding_tackle 0.126426 0.080880 0.315528 -0.218832 0.478135 0.410343 -0.115459 0.130645 0.124954 0.136704 0.022254 -0.033576 0.964033 0.972040 1.000000 -0.457093 -0.459105 -0.451598 -0.453518 -0.455968
gk_diving 0.023669 -0.021628 -0.664204 -0.535227 -0.737197 -0.744615 -0.543064 -0.729865 -0.601724 -0.552613 -0.414011 -0.585485 -0.445132 -0.489806 -0.457093 1.000000 0.965387 0.960186 0.966969 0.971412
gk_handling 0.025814 -0.021715 -0.660893 -0.528907 -0.734731 -0.740532 -0.539224 -0.725150 -0.597503 -0.548308 -0.409150 -0.579903 -0.448957 -0.491665 -0.459105 0.965387 1.000000 0.957973 0.965273 0.965426
gk_kicking 0.022068 -0.024431 -0.657448 -0.529505 -0.729859 -0.735705 -0.537686 -0.722204 -0.595797 -0.545713 -0.405800 -0.578311 -0.440554 -0.484632 -0.451598 0.960186 0.957973 1.000000 0.959491 0.960523
gk_positioning 0.024864 -0.024717 -0.664633 -0.535533 -0.734677 -0.742899 -0.544100 -0.728617 -0.603641 -0.552341 -0.413931 -0.586871 -0.442125 -0.486144 -0.453518 0.966969 0.965273 0.959491 1.000000 0.967059
gk_reflexes 0.025722 -0.020982 -0.666669 -0.535879 -0.736483 -0.742561 -0.543293 -0.729229 -0.603702 -0.552423 -0.413986 -0.586875 -0.443661 -0.488189 -0.455968 0.971412 0.965426 0.960523 0.967059 1.000000

35 rows × 35 columns

We get 35 rows and 35 columns – one of each for each variable. The values show the correlation score between the row and column at each point. Values will range from 1 (very strong positve correlation, as one goes up, the other tends to, too) to -1 (very strong negative correlation, one goes up will tend to push the other down, or vice-versa), via 0 (no relationship).

So looking at our table, the correlation score (proper name: r-squared) between curve and crossing is 0.8, suggesting a strong relationship. We would expect this, if you can curve the ball, you tend to be able to cross.

Additionally, heading accuracy has no real relationship (0.17) with potential ability. So, if like me, you are awful in the air, you can still make it!

Looking through lots of numbers is pretty draining – so let’s visualise this table. with a ‘.heatmap’:

In [4]:
fig, ax = plt.subplots()
fig.set_size_inches(14, 10)

ax=sns.heatmap(data.corr())

There is a lot happening here, and we wouldn’t try to present insights with this, but we can still learn something from it.

Clearly, goalkeepers are not rated for their outfield ability! There is negative correlation between the GK skills and outfield skills – as shown by the streaks of black and purple.

Simiarly, we can see negative correlation between strength and acceleration and agility. Got a strong player? They are unlikely to be quick or agile. If you can find one that is, they should command a decent fee due to their unique abilities!

Summary

In a page, we have been able to take a big dataset and try to ascertain relationships within it. By using ‘.corr()’ and ‘.heatmap()’ we create numerical and graphical charts that easily illustrate the data.

With our example, we spotted how stronger players usually have a lack of pace and agility. Also looking at the chart above, reactions seems to be the best indicator of overall rating. Maybe being a talented player isn’t about just being quick, or scoring from 35 yards, maybe reading the game is the key!

Next up, take a different look at plotting relationships between variables with scatter plots, or read up on correlation as a whole.

Posted by FCPythonADMIN in Visualisation