Scatter plots are fantastic visualisations for showing the relationship between variables. They plot two series of data, one across each axis, which allow for a quick look to check for any relationship.
Seaborn allows us to make really nice-looking visuals with little effort once our data is ready. Let’s get our modules and data fired up and kick off.
import seaborn as sns
import pandas as pd
%matplotlib inline
df = pd.read_csv("../../Data/FIFAPlayers.csv")
df.head(2)
Our data shows skill ratings across a number of attributes for lots and lots of players. In this article, we want to try and ascertain some relationships between this attributes.
Seaborn has a few ways to show scatter plots, and we'll focus on 'regplot()'. Let's start with a plot that should show a strong positive correlation - height and weight.
sns.regplot(x="height",y="weight",data=df)
So we can indeed see that there is a relationship between height and weight – as you’d expect, the taller you are, the heavier we can expect you to be. The line is a guess of where you would expect a future height or weight to end up.
The huge outlier in the top right is identified at the end of the article!
Our plot is created pretty easily. ‘.regplot()’ needed just 3 arguments here:
- X – The data along the x axis
- Y – The data along the y axis
- Data – The dataframe we are reading from
As with all Seaborn plots, there are some pretty cool customisation options. Let’s take a look at some examples:
sns.regplot(x="finishing",y="gk_handling",data=df,
color="green")
So this is a really odd one! But we can see that there is a big difference between two groups – it is probably fair to assume that the two groups are goalkeepers and outfield players.
Although we have a surprise elite finisher, with some goalkeeping ability…
SuarezvGhana.gif
Anyway, you can see that we can change the colours with the ‘color’ argument! Let’s change the ‘alpha’ next – this makes the dots see-through and shows how many values are on top of each other.
sns.regplot(x="long_passing",y="short_passing",data=df,
scatter_kws={'alpha':0.07})
This goes inside a dictionary called ‘scatter_kws’. This dictionary gives details specifically about the plot points, rather than the chart as a whole.
Multiple scatter plots & sizing
If you have a variable that you want to further split your data by, rather than create new visualisations entirely, you may want to create a grid of scatter plots.
Seaborn allows you to do this by specifcying ‘col’ and ‘row’ arguments according to the splits you want to see.
sns.lmplot(x="crossing",y="finishing",data=df,
scatter_kws={'alpha':0.1},
col="preferred_foot")
As you add more plots, the overall footprint of your chart is likely to get unmanageable.
sns.lmplot(x="crossing",y="finishing",data=df,
scatter_kws={'alpha':0.1},
col="preferred_foot",
row="attacking_work_rate",
aspect=2, size=2
)
Summary
We have seen how easily Seaborn makes good looking plots with minimum effort. ‘.regplot()’ takes just a few arguments to plot data along the x and y axes, which we can then customise with further information.
Develop your abilities on scatter plots with a look at further customisation options & other plot types.
Lots of the plots in this piece are also created for the sake of creating them – make sure that your charts carry more insight than mine!
And our really, really tall player from early in the article is, of course, Kristof van Hout!