Violin Plots in Seaborn

Violin plots are very similar to boxplots that you will have seen many times before. Violins are a little less common however, but show the depth of data ar various points, something a boxplot is incapable of doing. Additionally, due to their lack of use and more aesthetically pleasing look, proper use of these plots can make your work stand out.

This article will plot some data series of a teams’ player ages. This should allow us to compare the age profiles of teams quite easily and spot teams with young or aging squads.

Let’s get our modules imported along with a data frame of player information.

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


data = pd.read_csv("../../Data/Violin.csv", encoding = "ISO-8859-1")

data.head()
Out[1]:
Number Player Born Age Market value Team
0 1 Junling Yan Keeper Jan 28, 1991 26 £510k Shanghai SIPG
1 22 Le Sun Keeper Sep 17, 1989 27 £43k Shanghai SIPG
2 34 Wei Chen Keeper Feb 14, 1998 19 £21k Shanghai SIPG
3 35 Xiaodong Shi Keeper Feb 26, 1997 20 £21k Shanghai SIPG
4 16 Ricardo Carvalho Centre-Back May 18, 1978 38 £340k Shanghai SIPG

Here we have a dataset of Chinese Super League players. We are looking to plot the players’ ages, grouped by their team – this will give us a violin for each team.

Seaborn’s ‘.violinplot()’ will make these plots very easy. We need to give it three arguments to start with:

  • X – What are we grouping or data by? In this case, it is by teams.
  • Y – What metric are we looking to learn about? For now, it is the players’ ages.
  • Data – Where is our data kept?

So what does a default violinplot look like?

In [2]:
ax = sns.violinplot(x="Team", y="Age", data=data)

Very nice! Loads to improve on, but a good start!

Firstly, this is a bit small, so let’s use matplotlib to resize the plot area and re-plot:

In [3]:
fig, ax = plt.subplots()
fig.set_size_inches(14, 5)

ax = sns.violinplot(x="Team", y="Age", data=data)

Now we can see some different shapes much easier – but we can’t see which team is which! Let’s re-plot, but rotate the x axis labels and use ‘plt.show()’ to display the chart cleanly:

In [4]:
fig, ax = plt.subplots()
fig.set_size_inches(14, 5)

ax = sns.violinplot(x="Team", y="Age", data=data)
plt.xticks(rotation=65)
plt.show()

Much better! Now we can see that Chongqing have quite an even spread, compared to Shanghai Shenhua who have lots of players around 30 years old. Which is better? Up to you to use your football knowledge – or even test your theories – to decide.

While I enjoy the default rainbow colours, let’s create a new seaborn palette to assign club colours to each bar:

In [5]:
#Create a list of colours, in order of our teams on the plot)
CSLcols = ("#FF0000", "#9A050A", "#112987", "#00A4FA", "#FF6600", "#008040", "#004EA1", "#5B0CB3", "#E50211", "#FF0000", 
           "#00519A",  "#75A315", "#E70008", "#E40000", "#C80815", "#FF3300")

#Create the palette with 'sns.color_palette()' and pass our list as an argument
CSLpalette = sns.color_palette(CSLcols)

fig, ax = plt.subplots()
fig.set_size_inches(14, 5)

#Add an extra argument, our new palette
ax = sns.violinplot(x="Team", y="Age", data=data, palette = CSLpalette )
plt.xticks(rotation=65)
plt.show()

Great effort, that looks so much better! Now our viewers can easily pick out their own teams.

Summary

This article illustrates how Seaborn can quickly and easily make beautiful violin plots. When used appropriately, they add a bit more than a boxplot and draw much more attention.

We also saw how we can create a new Seaborn palette to map colours to our violins and rotate axis labels to aid understanding of our visualisation.

Next up, take a look at other visualisation types – or learn how to scrape data so that you can look at other leagues!