Python’s data visualisation libraries are great for exploratory and descriptive data analysis. When you have a new dataset, you may want to look at relationships en masse and then drilldown into something that you find particularly interesting. Python’s Seaborn module’s ‘.pairplot’ is one way to carry out your initial look at your data. This example takes a look at a few columns from a fantasy footall dataset, edited from here).

Plug in our modules, fire up the dataset and see what we’re dealing with.

In [1]:

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import pandas as pd
import numpy as np

In [2]:

data = pd.read_csv("../../Data/Fantasy_Football.csv")
data.head()

Out[2]:

	web_name	team name	first_name	second_name	squad_number	now_cost	selected_by_percent	total_points	points_per_game	minutes	bonus
0	Ospina	Arsenal	David	Ospina	13	48	0.2	0	0.0	0	0
1	Cech	Arsenal	Petr	Cech	33	54	5.2	63	3.9	1440	4
2	Martinez	Arsenal	Damian Emiliano	Martinez	26	40	0.6	0	0.0	0	0
3	Koscielny	Arsenal	Laurent	Koscielny	6	60	1.4	48	3.7	1164	6
4	Mertesacker	Arsenal	Per	Mertesacker	4	48	0.5	14	3.5	333	2

So we have a row for each player, containing their names, team and some numerical data including squad number, cost, selection and points.

These numbers, while readable individually, are impossible to read and make much use out of beyond a line-by-line understanding. Seaborn’s ‘.pairplot()’ allows us to take in a huge amount of data and see any relationships and the spread of each data point. It will take each numerical column, put them on both the x and y axes and plot a a scatter plot where they meet. Where the same variables meet, we get a histogram that shows the distribution of our variables. Let’s check out the default plot:

In [3]:

sns.pairplot(data)
plt.show()

So this is a lot of data to look at. While it is very useful, it can be quite overwhelming. Let’s use the ‘vars’ argument within pairplot to focus on a few variables.

We’ll also change our scatterplot to a regression type with ‘kind’, so that we can see the regression model that Seaborn would create if we were to use a reg plot. Now we’ll be able to better see any relationships:

In [4]:

sns.pairplot(data, vars=["now_cost","selected_by_percent","total_points"],  kind="reg")
plt.show()

That looks much more manageable! See how easy it is to create a complicated plot, that tells us a lot about our data very quickly? We can now see that most players are picked by nobody/very few people, and that the clearest relationship is between popularity and points – as we’d probably expect. Perhaps less predictably, the relationship between points and cost is comparatively weak.

Summary

This sets us up for a more comprehensive look at fantasy football, but hopefully this article goes to show how easy it can be to knock together an exploratory data visualisation with Seaborn’s pairplot. There are many more arguments that we could pass to improve this, from the colour (“hue=’position’, for example), to other types of plots within our pairplot. Take a look at the docs to find out all of your options.

After your exploratory analysis, you might want to check out our describing datasets article to go further!

Fantasy Football Seaborn