scatter plot

Scatter Plots & Crosshairs in Matplotlib

Across Python’s many visualisation libraries, you will find several ways to create scatter plots. Matplotlib, being one of the fundamental visualisation libraries, offers perhaps the simplest way to do so. In one line, we will be able to create scatter plots that show the relationship between two variables. It also offers easy ways to customise these charts, through adding crosshairs, text, colour and more.

This article will plot goals for and against from a season, taking you through the initial creation of the chart, then some customisation that Matplotlib offers. Import the modules and data and off we go.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
table = pd.read_csv("../../Data/1617table.csv")
table.head()
Out[2]:
Pos Team Pld W D L GF GA GD Pts
0 1 Chelsea 38 30 3 5 85 33 52 93
1 2 Tottenham Hotspur 38 26 8 4 86 26 60 86
2 3 Manchester City 38 23 9 6 80 39 41 78
3 4 Liverpool 38 22 10 6 78 42 36 76
4 5 Arsenal 38 23 6 9 77 44 33 75

Nothing exceptional about our table here. We have exactly the data that we would expect and we are going to plot goals for (GF) and goals against (GA).

Matplotlib’s ‘.plot()’ will make this incredibly easy. We just need to pass it three arguments: the data to plot along each of the axes and the plot type. In this case, the plot type is ‘o’ to show that we want to plot markers. Let’s see what the default chart looks like:

In [3]:
plt.plot(table['GF'],table['GA'],"o")
Out[3]:
[<matplotlib.lines.Line2D at 0x2488736de10>]

It is quite plain and has no labels, but you can see just how easy it is to do. It is almost just as easy to add some tiles.

Rather than directly plot the chart, we can create our chart area with the first line below, set its size, then add our features from there. Take a look:

In [4]:
#Create plot area
fig, ax = plt.subplots()

#Set plot size
fig.set_size_inches(7, 5)

#Plot chart as above, but change the plot type from 'o' to '*' - givng us stars!
plt.plot(table['GF'],table['GA'],"*")

#Add labels to chart area
ax.set_title("Goals for & Against")
ax.set_xlabel("Goals For")
ax.set_ylabel("Goals Against")

#Display the chart
plt.show()

Great work! Just a few lines of code make a massive difference to our charts.

This time, let’s add a crosshair to our chart to display the average line. This should help our viewers to see if a point is performing well or not.

To do this, we can use ‘plt.plot()’ again. Once again, we give it 3 arguments as a minimum:

  • Type – ‘k-‘ gives us two instructions to plot with. K means black, – means draw a line
  • Start/End X locations – Give these two coordinates in a list. In the example below, we calculate the average to get the coordinate.
  • Start/End Y locations – Once again, give these coordinates in a list. Below they are [90,20]

We also give two optional arguments. Linestyle changes the line, in this case “:” gives us a dotted line. Meanwhile, lw dictates the line width.

In [5]:
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)

plt.plot(table['GF'],table['GA'],"o")
plt.plot([table['GF'].mean(),table['GF'].mean()],[90,20],'k-', linestyle = ":", lw=1)
plt.plot([20,90],[table['GA'].mean(),table['GA'].mean()],'k-', linestyle = ":", lw=1)

ax.set_title("Goals for & Against")
ax.set_xlabel("Goals For")
ax.set_ylabel("Goals Against")

plt.show()

In our chart above, the crosshairs show the averages and it helps us to group teams accordingly. You may want to classify these quadrants with text on the chart and we add this in a similar way to titles.

Rather than ‘.set_title()’, we instead use ‘.text()’. You must give arguments for the x and y location, in addition to the text that you want to write. Our examples below also give information on the colour and size of the text. Take a look at how this comes up:

In [6]:
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)

plt.plot(table['GF'],table['GA'],"o")
plt.plot([table['GF'].mean(),table['GF'].mean()],[90,20],'k-', linestyle = ":", lw=1)
plt.plot([20,90],[table['GA'].mean(),table['GA'].mean()],'k-', linestyle = ":", lw=1)

ax.set_title("Goals For & Against")
ax.set_xlabel("Goals For")
ax.set_ylabel("Goals Against")

ax.text(18,90,"Poor attack, poor defense",color="red",size="8")
ax.text(67,20,"Strong attack, strong defense",color="red",size="8")

plt.show()

Summary

Head back and compare this last chart with the first one. Not only is the most recent one much, much better looking, it also is much more informative. Simple titles tell us what we are looking at, while crosshairs and text give insight.

This article illustrated the versatility of matplotlib and ‘.plot()’ being able to quickly draw charts and add detail to them. You can see above how we set up a chart area, than draw our chart and additional features. Take a look at the documentation for all of the customisations that you can add with matplotlib.

Create your own scatter plots and crosshair features, and check out some of the other visualisation options Matplotlib offers.

Posted by FCPythonADMIN in Visualisation

Scatter Plots in Seaborn

Scatter plots are fantastic visualisations for showing the relationship between variables. They plot two series of data, one across each axis, which allow for a quick look to check for any relationship.

Seaborn allows us to make really nice-looking visuals with little effort once our data is ready. Let’s get our modules and data fired up and kick off.

In [1]:
import seaborn as sns
import pandas as pd
%matplotlib inline

df = pd.read_csv("../../Data/FIFAPlayers.csv")

df.head(2)
Out[1]:
player_api_id overall_rating potential preferred_foot attacking_work_rate defensive_work_rate crossing finishing heading_accuracy short_passing gk_diving gk_handling gk_kicking gk_positioning gk_reflexes player_name birthday p_id height weight
0 307224 64 68 right medium low 44 63 73 49 12 12 7 11 12 Kevin Koubemba 23/03/1993 00:00 307224 193.04 198
1 512726 63 72 right medium medium 51 66 55 57 11 12 12 12 7 Yanis Mbombo Lokwa 08/04/1994 00:00 512726 177.80 172

2 rows × 44 columns

Our data shows skill ratings across a number of attributes for lots and lots of players. In this article, we want to try and ascertain some relationships between this attributes.

Seaborn has a few ways to show scatter plots, and we'll focus on 'regplot()'. Let's start with a plot that should show a strong positive correlation - height and weight.
In [2]:
sns.regplot(x="height",y="weight",data=df)
Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c879492828>

So we can indeed see that there is a relationship between height and weight – as you’d expect, the taller you are, the heavier we can expect you to be. The line is a guess of where you would expect a future height or weight to end up.

The huge outlier in the top right is identified at the end of the article!

Our plot is created pretty easily. ‘.regplot()’ needed just 3 arguments here:

  • X – The data along the x axis
  • Y – The data along the y axis
  • Data – The dataframe we are reading from

As with all Seaborn plots, there are some pretty cool customisation options. Let’s take a look at some examples:

In [3]:
sns.regplot(x="finishing",y="gk_handling",data=df,
           color="green")
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c879492748>

So this is a really odd one! But we can see that there is a big difference between two groups – it is probably fair to assume that the two groups are goalkeepers and outfield players.

Although we have a surprise elite finisher, with some goalkeeping ability…

SuarezvGhana.gif

Anyway, you can see that we can change the colours with the ‘color’ argument! Let’s change the ‘alpha’ next – this makes the dots see-through and shows how many values are on top of each other.

In [4]:
sns.regplot(x="long_passing",y="short_passing",data=df,
           scatter_kws={'alpha':0.07})
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c8793b4588>

This goes inside a dictionary called ‘scatter_kws’. This dictionary gives details specifically about the plot points, rather than the chart as a whole.

Multiple scatter plots & sizing

If you have a variable that you want to further split your data by, rather than create new visualisations entirely, you may want to create a grid of scatter plots.

Seaborn allows you to do this by specifcying ‘col’ and ‘row’ arguments according to the splits you want to see.

In [5]:
sns.lmplot(x="crossing",y="finishing",data=df,
           scatter_kws={'alpha':0.1},
           col="preferred_foot")
Out[5]:
<seaborn.axisgrid.FacetGrid at 0x1c8799b8630>

As you add more plots, the overall footprint of your chart is likely to get unmanageable.

In [6]:
sns.lmplot(x="crossing",y="finishing",data=df,
           scatter_kws={'alpha':0.1},
           col="preferred_foot",
           row="attacking_work_rate",
           aspect=2, size=2
           )
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x1c879e634e0>

Summary

We have seen how easily Seaborn makes good looking plots with minimum effort. ‘.regplot()’ takes just a few arguments to plot data along the x and y axes, which we can then customise with further information.

Develop your abilities on scatter plots with a look at further customisation options & other plot types.

Lots of the plots in this piece are also created for the sake of creating them – make sure that your charts carry more insight than mine!

And our really, really tall player from early in the article is, of course, Kristof van Hout!

Posted by FCPythonADMIN in Visualisation