Violin plots are very similar to boxplots that you will have seen many times before. Violins are a little less common however, but show the depth of data ar various points, something a boxplot is incapable of doing. Additionally, due to their lack of use and more aesthetically pleasing look, proper use of these plots can make your work stand out.
This article will plot some data series of a teams’ player ages. This should allow us to compare the age profiles of teams quite easily and spot teams with young or aging squads.
Let’s get our modules imported along with a data frame of player information.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline data = pd.read_csv("../../Data/Violin.csv", encoding = "ISO-8859-1") data.head()
|0||1||Junling Yan Keeper||Jan 28, 1991||26||Â£510k||Shanghai SIPG|
|1||22||Le Sun Keeper||Sep 17, 1989||27||Â£43k||Shanghai SIPG|
|2||34||Wei Chen Keeper||Feb 14, 1998||19||Â£21k||Shanghai SIPG|
|3||35||Xiaodong Shi Keeper||Feb 26, 1997||20||Â£21k||Shanghai SIPG|
|4||16||Ricardo Carvalho Centre-Back||May 18, 1978||38||Â£340k||Shanghai SIPG|
Here we have a dataset of Chinese Super League players. We are looking to plot the players’ ages, grouped by their team – this will give us a violin for each team.
Seaborn’s ‘.violinplot()’ will make these plots very easy. We need to give it three arguments to start with:
- X – What are we grouping or data by? In this case, it is by teams.
- Y – What metric are we looking to learn about? For now, it is the players’ ages.
- Data – Where is our data kept?
So what does a default violinplot look like?
ax = sns.violinplot(x="Team", y="Age", data=data)
Very nice! Loads to improve on, but a good start!
Firstly, this is a bit small, so let’s use matplotlib to resize the plot area and re-plot:
fig, ax = plt.subplots() fig.set_size_inches(14, 5) ax = sns.violinplot(x="Team", y="Age", data=data)
Now we can see some different shapes much easier – but we can’t see which team is which! Let’s re-plot, but rotate the x axis labels and use ‘plt.show()’ to display the chart cleanly:
fig, ax = plt.subplots() fig.set_size_inches(14, 5) ax = sns.violinplot(x="Team", y="Age", data=data) plt.xticks(rotation=65) plt.show()
Much better! Now we can see that Chongqing have quite an even spread, compared to Shanghai Shenhua who have lots of players around 30 years old. Which is better? Up to you to use your football knowledge – or even test your theories – to decide.
While I enjoy the default rainbow colours, let’s create a new seaborn palette to assign club colours to each bar:
#Create a list of colours, in order of our teams on the plot) CSLcols = ("#FF0000", "#9A050A", "#112987", "#00A4FA", "#FF6600", "#008040", "#004EA1", "#5B0CB3", "#E50211", "#FF0000", "#00519A", "#75A315", "#E70008", "#E40000", "#C80815", "#FF3300") #Create the palette with 'sns.color_palette()' and pass our list as an argument CSLpalette = sns.color_palette(CSLcols) fig, ax = plt.subplots() fig.set_size_inches(14, 5) #Add an extra argument, our new palette ax = sns.violinplot(x="Team", y="Age", data=data, palette = CSLpalette ) plt.xticks(rotation=65) plt.show()
Great effort, that looks so much better! Now our viewers can easily pick out their own teams.
This article illustrates how Seaborn can quickly and easily make beautiful violin plots. When used appropriately, they add a bit more than a boxplot and draw much more attention.
We also saw how we can create a new Seaborn palette to map colours to our violins and rotate axis labels to aid understanding of our visualisation.
Next up, take a look at other visualisation types – or learn how to scrape data so that you can look at other leagues!