Finding the Right England Songs with Spotify Data & Python

2020 is a Euros year, and guaranteed THE year that England end x years of hurt.

Of course, it will not be plain sailing and I’m sure that there will be ups and downs along the way. To console and to celebrate, we need the England classics as the soundtrack. But how can we find the right songs for the right moments?!

Fortunately, Spotify provides us with the songs AND the data to find the right tune to fit the mood. In this tutorial, we’re going to use the Spotipy module to extract data on a playlist of England songs. Then for each song, we’ll get a load of data points that tell us some details about the song – how happy it is, how easy it is to dance to and so on. Finally, we’ll make a table and plot to show how we can find the song to accompany England’s tournament!

Packages in place and let’s go!

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns

import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.oauth2 as oauth2

Before we do the fun stuff, we need to get authentication from Spotify to extract data. It is super simple, you just need to register here, start an ‘app’ and get an ID and secret.

The Spotipy module then makes it easy to use the ID and secret to set up a session where we can interact with the Spotify API. There are loads of use cases for it here, but this tutorial will take us through how to get and make use of song characteristics.

We’ll load the client ID and secret into variables, then use Spotipy’s authentication process to start a session.

In [2]:
CLIENT_ID = "xxx"

client_credentials_manager = SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

We’re in. Check out the docs for all of the things that can be done from here. We are interested in the audio_features function, which takes a song ID and returns Spotify’s data on the track. Here’s an example below:

In [3]:
[{'danceability': 0.721,
  'energy': 0.939,
  'key': 8,
  'loudness': -11.823,
  'mode': 1,
  'speechiness': 0.0376,
  'acousticness': 0.115,
  'instrumentalness': 3.79e-05,
  'liveness': 0.108,
  'valence': 0.914,
  'tempo': 113.309,
  'type': 'audio_features',
  'id': '4uLU6hMCjMI75M1A2tKUQC',
  'uri': 'spotify:track:4uLU6hMCjMI75M1A2tKUQC',
  'track_href': '',
  'analysis_url': '',
  'duration_ms': 213573,
  'time_signature': 4}]

So we get some really cool data on a song, which Spotify has calculated based on features that it programatically identifies – if there is a distinct rhythm, it gets a high danceability score, if no voices are detected, it is high on the instrumentalness scale, and so on. We’ll go through a couple more later in the article, but all of the definitions of these audio features are here.

What we need to do now is to create a dataset of these features for England songs. We could collect them individually, but surely a playlist exists somewhere with all these bangers. Fortunately, Spotify user ‘Cuffley Blade’ has done this for us. You can save the playlist for later listening here.

We can call a playlist just like the track above with the .playlist() function, and feeding it an ID. This returns a huge dictionary with playlist data, then a track dictionary for each song in the playlist. It is way too big to feature here, so we’re going to navigate through the playlist dictionary and find the first track’s name and artist below:

In [4]:
'World in Motion'
In [5]:
'New Order'

Strong start for the playlist.

But one song at a time would take forever, so let’s write something that will loop through the tracks in the playlist and take the artist, name, popularity score and ID, and store them in lists:

In [16]:
#Separate out the track listing from the main playlist object
playlistTracks = sp.playlist('28gX2hq23N4WonSnRtRcUu')['tracks']['items']

#Create empty lists for each datapoint we want to take
artistName = []
trackName = []
trackID = []
trackPop = []

#Loop through each track and append the relevant information to the list
for index, track in enumerate(playlistTracks):

Let’s test this, and see if we have the songs that we saw in the database earlier:

In [7 ]:
Out[7 ]:
['World in Motion',
 'Back Home',
 'Three Lions',
 'Eat My Goal',
 'Come On England',
 "We're on the Ball - Official England Song for the 2002 Fifa World Cup",
 'Is This The Way To The World Cup',
 'Meat Pie, Sausage Roll - England Edit',
 "I'm England 'Till I Die",
 'Whole Again',
 'God Save The Queen']

Bloody. Yes. Crouch at the back post, Beckham straight down the middle, Joe Cole from his own half 😍😍😍

Couple of odd bits though, with songs have a “-” and other information. Let’s tidy those up but splitting the titles on the hyphen and keeping the first half

In [8]:
trackName[7] = trackName[7].split(" - ")[0]
trackName[10] = trackName[10].split(" - ")[0]
['World in Motion',
 'Back Home',
 'Three Lions',
 'Eat My Goal',
 'Come On England',
 "We're on the Ball",
 'Is This The Way To The World Cup',
 'Meat Pie, Sausage Roll',
 "I'm England 'Till I Die",
 'Whole Again',
 'God Save The Queen']

Much better.

We also took the track ID for each. Just like before, we can use these to get the song’s features. World in Motion was the first song in our list, let’s use the trackID list to get its features.

In [9]:
[{'danceability': 0.603,
  'energy': 0.955,
  'key': 1,
  'loudness': -4.111,
  'mode': 1,
  'speechiness': 0.0458,
  'acousticness': 0.0239,
  'instrumentalness': 0.0451,
  'liveness': 0.119,
  'valence': 0.787,
  'tempo': 123.922,
  'type': 'audio_features',
  'id': '08po8QZK3tihnLBZWATAki',
  'uri': 'spotify:track:08po8QZK3tihnLBZWATAki',
  'track_href': '',
  'analysis_url': '',
  'duration_ms': 270827,
  'time_signature': 4}]

Works just as before. We can now loop through these IDs and append relevant data to lists, like we did for the songs themselves. Brief definitions of the data we’re taking, but a reminder that the full information is here###.

In [10]:
#How suitable the track is to bust a move, from 0 - 1
danceability = []

#Detects presence of an audience in the audio, 0 - 1
liveness = []

#How happy the track is, 0 - 1
valence = []

#How much the track is spoken word, vs song, 0 - 1
speechiness = []

tempo = []

#Is the track acoustic? 0 - 1
acousticness = []

#How intense the song is, 0 - 1
energy = []

for index, track in enumerate(sp.audio_features(trackID)):

Between these features, the track name, artist and popularity, we have 10 lists. A dataframe would make this much easier to read. Let’s join them up and take a look at our data

In [11]:
dataframe = pd.DataFrame({'Track':trackName, 'Artist':artistName, 'Popularity':trackPop, 'Danceability':danceability,
                         'Liveness':liveness, 'Happiness':valence, 'Speechiness':speechiness, 'Tempo':tempo,
                         'Acousticness':acousticness, 'Energy':energy})
Track Artist Popularity Danceability Liveness Happiness Speechiness Tempo Acousticness Energy
0 World in Motion New Order 43 0.603 0.1190 0.787 0.0458 123.922 0.02390 0.955
1 Back Home 1970 England World Cup Squad 13 0.552 0.6930 0.686 0.0547 126.240 0.68600 0.907
2 Vindaloo Fat Les 0 0.647 0.2770 0.344 0.0759 120.062 0.11800 0.969
3 Three Lions Baddiel, Skinner & Lightning Seeds 0 0.529 0.3450 0.612 0.0329 126.279 0.07420 0.752
4 Eat My Goal Collapsed Lung 23 0.819 0.1860 0.837 0.0483 116.966 0.00868 0.946
5 Jerusalem Fat Les 26 0.260 0.0975 0.450 0.0439 78.172 0.59800 0.514
6 Come On England 442 28 0.643 0.3300 0.641 0.0755 117.920 0.10200 0.926
7 We’re on the Ball Ant & Dec 30 0.639 0.1160 0.799 0.0495 120.041 0.04420 0.977
8 Is This The Way To The World Cup Tony Christie 24 0.597 0.2960 0.890 0.0308 136.969 0.11500 0.890
9 Shout Shout for England 31 0.587 0.8700 0.621 0.1070 98.025 0.02240 0.914
10 Meat Pie, Sausage Roll Grandad Roberts And His Son Elvis 18 0.778 0.1290 0.606 0.0490 124.110 0.12400 0.594
11 I’m England ‘Till I Die England Supporters Club 21 0.332 0.8770 0.678 0.0362 104.816 0.82300 0.714
12 Whole Again Atomic Kitten 53 0.742 0.1110 0.652 0.0351 94.011 0.06210 0.715
13 God Save The Queen The First Fifteen Choir 2 0.371 0.1010 0.722 0.0325 74.598 0.05320 0.261

And now we have a data source for matching England songs to the tournament mood. Want something danceable at a high energy? Eat My Goal. Sad and low energy? Jerusalem.

We can even use the dataframes .sort_values() functionality to do the lookup for us based on what we want to see:

In [12]:
dataframe.sort_values("Happiness", ascending = False).head(3)
Track Artist Popularity Danceability Liveness Happiness Speechiness Tempo Acousticness Energy
8 Is This The Way To The World Cup Tony Christie 24 0.597 0.296 0.890 0.0308 136.969 0.11500 0.890
4 Eat My Goal Collapsed Lung 23 0.819 0.186 0.837 0.0483 116.966 0.00868 0.946
7 We’re on the Ball Ant & Dec 30 0.639 0.116 0.799 0.0495 120.041 0.04420 0.977

Now we have the 3 happiest songs in the playlist ready to go, and tough to argue with any of these.

Of course, you’d be unlikely to take a Jupyter notebook down the pub, or to your nearest riot, so I’d recommend making a print out graphic to take with you.

In [13]:
#Set base style and size'fivethirtyeight')
plt.figure(num=None, figsize=(6, 4), dpi=100)

#Set subtle St. George's Cross underneath, don't want to come across strong
rect = patches.Rectangle((0.4,0),0.2,1, color="red", alpha=0.01) 
rect2 = patches.Rectangle((0,0.4),0.4,0.2, color="red", alpha=0.01) 
rect3 = patches.Rectangle((0.6,0.4),1,0.2, color="red", alpha=0.01) 

#Plot data
ax = sns.scatterplot(x="Danceability", y="Happiness", data=dataframe, 
                     s=100, color='#b50523')

#Set title
ax.text(x = 0.05, y = 1.15, s = "Finding the England Song for the Mood",
               fontsize = 15, alpha = 0.9)
#Set Annotations
ax.text(x = 0.79, y = 0.76, s = "Eat My Goal",
               fontsize = 10, alpha = 1)
ax.text(x = 0.17, y = 0.38, s = "Jerusalem",
               fontsize = 10, alpha = 1)
ax.text(x = 0.6, y = 0.28, s = "Vindaloo",
               fontsize = 10, alpha = 1)
ax.text(x = 0.45, y = 0.95, s = "Is This The Way...",
               fontsize = 10, alpha = 1)

#Set mood examples
ax.text(x = 0.85, y = 0.95, s = "Trippier FK",
               fontsize = 10, alpha = 0.4)
ax.text(x = 0.03, y = 0.05, s = "Mandzukic ghosts by Stones",
               fontsize = 10, alpha = 0.4)

#Remove grid and add axis lines
ax.axhline(y=0.005, color='#414141', linewidth=1.5, alpha=.5)
ax.axvline(x=0.005, color='#414141', linewidth=1.5, alpha=.5)

#Set axis limits

#Set axis labels
ax.set_yticklabels(labels=['0', '20', '40', '60', '80','100%'], fontsize=12, color='#414141')
ax.set_xticklabels(labels=['0', '20', '40', '60', '80','100%'], fontsize=12, color='#414141')

#Set axis titles
plt.xlabel('Danceability', fontsize=13, color='#2a2a2b')
plt.ylabel('Happiness', fontsize=13, color='#2a2a2b')

Nice little chart you can print out and keep in your wallet!

In this tutorial, we have seen how we can navigate the Spotify API by using the Spotipy module. We have found out how we can get data about songs, and navigate a playlist to do this programatically for a group of tracks.

As for wider Python skills, we have practiced how to loop through items and store information about each one. We have then joined this up into a dataframe for analysis and visualisation.