If you have just taken a look at NumPy’s arrays, then Pandas’ series will be really easy to pick up.

The key difference between these two data types is that series allow us to label our axes, making our grids a lot easier to read, index and utilise.

Let’s fire up NumPy and Pandas and create some series. Remember to install these modules if you haven’t already.

In [1]:
import numpy as np
import pandas as pd
In [2]:
Capacity = pd.Series(data=[60432,55097,39460])
0    60432
1    55097
2    39460
dtype: int64

So there we have our first series, created from a list of [100,200,300]. You’ll notice that this looks quite different from our previous lists and arrays because we have an index running alongside it.

What is really cool about series, is that they allow us to change these index labels:

In [3]:
Capacity = pd.Series(data=[60432,55097,39460],
                     index=["Emirates Stadium","Etihad Stadium","Elland Road"])
Emirates Stadium    60432
Etihad Stadium      55097
Elland Road         39460
dtype: int64

Passing an index argument changes the index labels – our data is now so much easier to read when we need to. Easier to select, too:

In [4]:
Capacity["Elland Road"]

In this example, our stadium capacities and labels were in two separate lists. We can do the same thing with a dictionary:

In [5]:
CapacityDict = {'Ewood Park':31367,
                'Liberty Stadium':20937,
                'Portman Road':30311}

Capacity = pd.Series(CapacityDict)
Ewood Park         31367
Liberty Stadium    20937
Portman Road       30311
dtype: int64


Told you series would be easy to understand. A simple concept, but one that makes our data a bit more comfortable to use – we can now understand data by labels, not just index numbers.

Pandas’ data frame builds on this further to create labelled grids. Once we understand these we can really get started with data analysis in Python.

Leave a Reply