Scatter Plots in Seaborn

Scatter plots are fantastic visualisations for showing the relationship between variables. They plot two series of data, one across each axis, which allow for a quick look to check for any relationship.

Seaborn allows us to make really nice-looking visuals with little effort once our data is ready. Let’s get our modules and data fired up and kick off.

In [1]:
import seaborn as sns
import pandas as pd
%matplotlib inline

df = pd.read_csv("../../Data/FIFAPlayers.csv")

df.head(2)
Out[1]:
player_api_id overall_rating potential preferred_foot attacking_work_rate defensive_work_rate crossing finishing heading_accuracy short_passing gk_diving gk_handling gk_kicking gk_positioning gk_reflexes player_name birthday p_id height weight
0 307224 64 68 right medium low 44 63 73 49 12 12 7 11 12 Kevin Koubemba 23/03/1993 00:00 307224 193.04 198
1 512726 63 72 right medium medium 51 66 55 57 11 12 12 12 7 Yanis Mbombo Lokwa 08/04/1994 00:00 512726 177.80 172

2 rows × 44 columns

Our data shows skill ratings across a number of attributes for lots and lots of players. In this article, we want to try and ascertain some relationships between this attributes.

Seaborn has a few ways to show scatter plots, and we'll focus on 'regplot()'. Let's start with a plot that should show a strong positive correlation - height and weight.
In [2]:
sns.regplot(x="height",y="weight",data=df)
Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c879492828>

So we can indeed see that there is a relationship between height and weight – as you’d expect, the taller you are, the heavier we can expect you to be. The line is a guess of where you would expect a future height or weight to end up.

The huge outlier in the top right is identified at the end of the article!

Our plot is created pretty easily. ‘.regplot()’ needed just 3 arguments here:

  • X – The data along the x axis
  • Y – The data along the y axis
  • Data – The dataframe we are reading from

As with all Seaborn plots, there are some pretty cool customisation options. Let’s take a look at some examples:

In [3]:
sns.regplot(x="finishing",y="gk_handling",data=df,
           color="green")
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c879492748>

So this is a really odd one! But we can see that there is a big difference between two groups – it is probably fair to assume that the two groups are goalkeepers and outfield players.

Although we have a surprise elite finisher, with some goalkeeping ability…

SuarezvGhana.gif

Anyway, you can see that we can change the colours with the ‘color’ argument! Let’s change the ‘alpha’ next – this makes the dots see-through and shows how many values are on top of each other.

In [4]:
sns.regplot(x="long_passing",y="short_passing",data=df,
           scatter_kws={'alpha':0.07})
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c8793b4588>

This goes inside a dictionary called ‘scatter_kws’. This dictionary gives details specifically about the plot points, rather than the chart as a whole.

Multiple scatter plots & sizing

If you have a variable that you want to further split your data by, rather than create new visualisations entirely, you may want to create a grid of scatter plots.

Seaborn allows you to do this by specifcying ‘col’ and ‘row’ arguments according to the splits you want to see.

In [5]:
sns.lmplot(x="crossing",y="finishing",data=df,
           scatter_kws={'alpha':0.1},
           col="preferred_foot")
Out[5]:
<seaborn.axisgrid.FacetGrid at 0x1c8799b8630>

As you add more plots, the overall footprint of your chart is likely to get unmanageable.

In [6]:
sns.lmplot(x="crossing",y="finishing",data=df,
           scatter_kws={'alpha':0.1},
           col="preferred_foot",
           row="attacking_work_rate",
           aspect=2, size=2
           )
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x1c879e634e0>

Summary

We have seen how easily Seaborn makes good looking plots with minimum effort. ‘.regplot()’ takes just a few arguments to plot data along the x and y axes, which we can then customise with further information.

Develop your abilities on scatter plots with a look at further customisation options & other plot types.

Lots of the plots in this piece are also created for the sake of creating them – make sure that your charts carry more insight than mine!

And our really, really tall player from early in the article is, of course, Kristof van Hout!