Of course, it will not be plain sailing and I’m sure that there will be ups and downs along the way. To console and to celebrate, we need the England classics as the soundtrack. But how can we find the right songs for the right moments?!

Fortunately, Spotify provides us with the songs AND the data to find the right tune to fit the mood. In this tutorial, we’re going to use the Spotipy module to extract data on a playlist of England songs. Then for each song, we’ll get a load of data points that tell us some details about the song – how happy it is, how easy it is to dance to and so on. Finally, we’ll make a table and plot to show how we can find the song to accompany England’s tournament!

Packages in place and let’s go!

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns

import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.oauth2 as oauth2

Before we do the fun stuff, we need to get authentication from Spotify to extract data. It is super simple, you just need to register here, start an ‘app’ and get an ID and secret.

The Spotipy module then makes it easy to use the ID and secret to set up a session where we can interact with the Spotify API. There are loads of use cases for it here, but this tutorial will take us through how to get and make use of song characteristics.

We’ll load the client ID and secret into variables, then use Spotipy’s authentication process to start a session.

CLIENT_ID = "xxx"
CLIENT_SECRET = "xxx"

client_credentials_manager = SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

We’re in. Check out the docs for all of the things that can be done from here. We are interested in the audio_features function, which takes a song ID and returns Spotify’s data on the track. Here’s an example below:

sp.audio_features(['4uLU6hMCjMI75M1A2tKUQC'])

[{'danceability': 0.721,
  'energy': 0.939,
  'key': 8,
  'loudness': -11.823,
  'mode': 1,
  'speechiness': 0.0376,
  'acousticness': 0.115,
  'instrumentalness': 3.79e-05,
  'liveness': 0.108,
  'valence': 0.914,
  'tempo': 113.309,
  'type': 'audio_features',
  'id': '4uLU6hMCjMI75M1A2tKUQC',
  'uri': 'spotify:track:4uLU6hMCjMI75M1A2tKUQC',
  'track_href': 'https://api.spotify.com/v1/tracks/4uLU6hMCjMI75M1A2tKUQC',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/4uLU6hMCjMI75M1A2tKUQC',
  'duration_ms': 213573,
  'time_signature': 4}]

So we get some really cool data on a song, which Spotify has calculated based on features that it programatically identifies – if there is a distinct rhythm, it gets a high danceability score, if no voices are detected, it is high on the instrumentalness scale, and so on. We’ll go through a couple more later in the article, but all of the definitions of these audio features are here.

What we need to do now is to create a dataset of these features for England songs. We could collect them individually, but surely a playlist exists somewhere with all these bangers. Fortunately, Spotify user ‘Cuffley Blade’ has done this for us. You can save the playlist for later listening here.

We can call a playlist just like the track above with the .playlist() function, and feeding it an ID. This returns a huge dictionary with playlist data, then a track dictionary for each song in the playlist. It is way too big to feature here, so we’re going to navigate through the playlist dictionary and find the first track’s name and artist below:

sp.playlist('28gX2hq23N4WonSnRtRcUu')['tracks']['items'][0]['track']['name']

'World in Motion'

sp.playlist('28gX2hq23N4WonSnRtRcUu')['tracks']['items'][0]['track']['artists'][0]['name']

'New Order'

Strong start for the playlist.

But one song at a time would take forever, so let’s write something that will loop through the tracks in the playlist and take the artist, name, popularity score and ID, and store them in lists:

#Separate out the track listing from the main playlist object
playlistTracks = sp.playlist('28gX2hq23N4WonSnRtRcUu')['tracks']['items']

#Create empty lists for each datapoint we want to take
artistName = []
trackName = []
trackID = []
trackPop = []

#Loop through each track and append the relevant information to the list
for index, track in enumerate(playlistTracks):
        artistName.append(track['track']['artists'][0]['name'])
        trackName.append(track['track']['name'])
        trackID.append(track['track']['id'])
        trackPop.append(track['track']['popularity'])

Let’s test this, and see if we have the songs that we saw in the database earlier:

trackName

['World in Motion',
 'Back Home',
 'Vindaloo',
 'Three Lions',
 'Eat My Goal',
 'Jerusalem',
 'Come On England',
 "We're on the Ball - Official England Song for the 2002 Fifa World Cup",
 'Is This The Way To The World Cup',
 'Shout',
 'Meat Pie, Sausage Roll - England Edit',
 "I'm England 'Till I Die",
 'Whole Again',
 'God Save The Queen']

Bloody. Yes. Crouch at the back post, Beckham straight down the middle, Joe Cole from his own half 😍😍😍

Couple of odd bits though, with songs have a “-” and other information. Let’s tidy those up but splitting the titles on the hyphen and keeping the first half

trackName[7] = trackName[7].split(" - ")[0]
trackName[10] = trackName[10].split(" - ")[0]
trackName

['World in Motion',
 'Back Home',
 'Vindaloo',
 'Three Lions',
 'Eat My Goal',
 'Jerusalem',
 'Come On England',
 "We're on the Ball",
 'Is This The Way To The World Cup',
 'Shout',
 'Meat Pie, Sausage Roll',
 "I'm England 'Till I Die",
 'Whole Again',
 'God Save The Queen']

Much better.

We also took the track ID for each. Just like before, we can use these to get the song’s features. World in Motion was the first song in our list, let’s use the trackID list to get its features.

sp.audio_features(trackID[0])

[{'danceability': 0.603,
  'energy': 0.955,
  'key': 1,
  'loudness': -4.111,
  'mode': 1,
  'speechiness': 0.0458,
  'acousticness': 0.0239,
  'instrumentalness': 0.0451,
  'liveness': 0.119,
  'valence': 0.787,
  'tempo': 123.922,
  'type': 'audio_features',
  'id': '08po8QZK3tihnLBZWATAki',
  'uri': 'spotify:track:08po8QZK3tihnLBZWATAki',
  'track_href': 'https://api.spotify.com/v1/tracks/08po8QZK3tihnLBZWATAki',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/08po8QZK3tihnLBZWATAki',
  'duration_ms': 270827,
  'time_signature': 4}]

Works just as before. We can now loop through these IDs and append relevant data to lists, like we did for the songs themselves. Brief definitions of the data we’re taking, but a reminder that the full information is here###.

#How suitable the track is to bust a move, from 0 - 1
danceability = []

#Detects presence of an audience in the audio, 0 - 1
liveness = []

#How happy the track is, 0 - 1
valence = []

#How much the track is spoken word, vs song, 0 - 1
speechiness = []

#BPM
tempo = []

#Is the track acoustic? 0 - 1
acousticness = []

#How intense the song is, 0 - 1
energy = []

for index, track in enumerate(sp.audio_features(trackID)):
        danceability.append(track['danceability'])
        liveness.append(track['liveness'])
        valence.append(track['valence'])
        speechiness.append(track['speechiness'])
        tempo.append(track['tempo'])
        acousticness.append(track['acousticness'])
        energy.append(track['energy'])

Between these features, the track name, artist and popularity, we have 10 lists. A dataframe would make this much easier to read. Let’s join them up and take a look at our data

dataframe = pd.DataFrame({'Track':trackName, 'Artist':artistName, 'Popularity':trackPop, 'Danceability':danceability,
                         'Liveness':liveness, 'Happiness':valence, 'Speechiness':speechiness, 'Tempo':tempo,
                         'Acousticness':acousticness, 'Energy':energy})
dataframe

And now we have a data source for matching England songs to the tournament mood. Want something danceable at a high energy? Eat My Goal. Sad and low energy? Jerusalem.

We can even use the dataframes .sort_values() functionality to do the lookup for us based on what we want to see:

dataframe.sort_values("Happiness", ascending = False).head(3)

Now we have the 3 happiest songs in the playlist ready to go, and tough to argue with any of these.

Of course, you’d be unlikely to take a Jupyter notebook down the pub, or to your nearest riot, so I’d recommend making a print out graphic to take with you.

#Set base style and size
plt.style.use('fivethirtyeight')
plt.figure(num=None, figsize=(6, 4), dpi=100)

#Set subtle St. George's Cross underneath, don't want to come across strong
rect = patches.Rectangle((0.4,0),0.2,1, color="red", alpha=0.01) 
plt.gca().add_patch(rect)
rect2 = patches.Rectangle((0,0.4),0.4,0.2, color="red", alpha=0.01) 
plt.gca().add_patch(rect2)
rect3 = patches.Rectangle((0.6,0.4),1,0.2, color="red", alpha=0.01) 
plt.gca().add_patch(rect3)

#Plot data
ax = sns.scatterplot(x="Danceability", y="Happiness", data=dataframe, 
                     s=100, color='#b50523')

#Set title
ax.text(x = 0.05, y = 1.15, s = "Finding the England Song for the Mood",
               fontsize = 15, alpha = 0.9)
#Set Annotations
ax.text(x = 0.79, y = 0.76, s = "Eat My Goal",
               fontsize = 10, alpha = 1)
ax.text(x = 0.17, y = 0.38, s = "Jerusalem",
               fontsize = 10, alpha = 1)
ax.text(x = 0.6, y = 0.28, s = "Vindaloo",
               fontsize = 10, alpha = 1)
ax.text(x = 0.45, y = 0.95, s = "Is This The Way...",
               fontsize = 10, alpha = 1)

#Set mood examples
ax.text(x = 0.85, y = 0.95, s = "Trippier FK",
               fontsize = 10, alpha = 0.4)
ax.text(x = 0.03, y = 0.05, s = "Mandzukic ghosts by Stones",
               fontsize = 10, alpha = 0.4)

#Remove grid and add axis lines
ax.grid(False)
ax.axhline(y=0.005, color='#414141', linewidth=1.5, alpha=.5)
ax.axvline(x=0.005, color='#414141', linewidth=1.5, alpha=.5)

#Set axis limits
ax.set(ylim=(0,1))
ax.set(xlim=(0,1))

#Set axis labels
ax.set_yticklabels(labels=['0', '20', '40', '60', '80','100%'], fontsize=12, color='#414141')
ax.set_xticklabels(labels=['0', '20', '40', '60', '80','100%'], fontsize=12, color='#414141')

#Set axis titles
plt.xlabel('Danceability', fontsize=13, color='#2a2a2b')
plt.ylabel('Happiness', fontsize=13, color='#2a2a2b')

#Plot
ax.plot()

In this tutorial, we have seen how we can navigate the Spotify API by using the Spotipy module. We have found out how we can get data about songs, and navigate a playlist to do this programatically for a group of tracks.

As for wider Python skills, we have practiced how to loop through items and store information about each one. We have then joined this up into a dataframe for analysis and visualisation.

	Track	Artist	Popularity	Danceability	Liveness	Happiness	Speechiness	Tempo	Acousticness	Energy
0	World in Motion	New Order	43	0.603	0.1190	0.787	0.0458	123.922	0.02390	0.955
1	Back Home	1970 England World Cup Squad	13	0.552	0.6930	0.686	0.0547	126.240	0.68600	0.907
2	Vindaloo	Fat Les	0	0.647	0.2770	0.344	0.0759	120.062	0.11800	0.969
3	Three Lions	Baddiel, Skinner & Lightning Seeds	0	0.529	0.3450	0.612	0.0329	126.279	0.07420	0.752
4	Eat My Goal	Collapsed Lung	23	0.819	0.1860	0.837	0.0483	116.966	0.00868	0.946
5	Jerusalem	Fat Les	26	0.260	0.0975	0.450	0.0439	78.172	0.59800	0.514
6	Come On England	442	28	0.643	0.3300	0.641	0.0755	117.920	0.10200	0.926
7	We’re on the Ball	Ant & Dec	30	0.639	0.1160	0.799	0.0495	120.041	0.04420	0.977
8	Is This The Way To The World Cup	Tony Christie	24	0.597	0.2960	0.890	0.0308	136.969	0.11500	0.890
9	Shout	Shout for England	31	0.587	0.8700	0.621	0.1070	98.025	0.02240	0.914
10	Meat Pie, Sausage Roll	Grandad Roberts And His Son Elvis	18	0.778	0.1290	0.606	0.0490	124.110	0.12400	0.594
11	I’m England ‘Till I Die	England Supporters Club	21	0.332	0.8770	0.678	0.0362	104.816	0.82300	0.714
12	Whole Again	Atomic Kitten	53	0.742	0.1110	0.652	0.0351	94.011	0.06210	0.715
13	God Save The Queen	The First Fifteen Choir	2	0.371	0.1010	0.722	0.0325	74.598	0.05320	0.261

	Track	Artist	Popularity	Danceability	Liveness	Happiness	Speechiness	Tempo	Acousticness	Energy
8	Is This The Way To The World Cup	Tony Christie	24	0.597	0.296	0.890	0.0308	136.969	0.11500	0.890
4	Eat My Goal	Collapsed Lung	23	0.819	0.186	0.837	0.0483	116.966	0.00868	0.946
7	We’re on the Ball	Ant & Dec	30	0.639	0.116	0.799	0.0495	120.041	0.04420	0.977