Pivot Tables in Python

A pivot table is a table that displays grouped data from a larger data set, running a function to get a summary for a set of variables in a column. A popular feature in Excel, Python makes it easy to create the same with your dataframes.

This Extra Time tutorial shows how to create pivot tables, with the example of a dataset of matches from a full season. Import your modules, inspect the data and let’s go!

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv("1617.csv")
len(df)
Out[1]:
380

As you’ll see below, we have a record for each game of the season. The columns contain lots of data points, including goals, cards and fouls.

In [2]:
df.head()
Out[2]:
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR Referee HST AST HF AF HC AC HY AY HR AR
0 13/08/2016 Burnley Swansea 0 1 A 0 0 D J Moss 3 9 10 14 7 4 3 2 0 0
1 13/08/2016 Crystal Palace West Brom 0 1 A 0 0 D C Pawson 4 3 12 15 3 6 2 2 0 0
2 13/08/2016 Everton Tottenham 1 1 D 1 0 H M Atkinson 6 4 10 14 5 6 0 0 0 0
3 13/08/2016 Hull Leicester 2 1 H 1 0 H M Dean 5 5 8 17 5 3 2 2 0 0
4 13/08/2016 Man City Sunderland 2 1 H 1 0 H R Madley 4 3 11 14 9 6 1 2 0 0

5 rows × 22 columns

We create a pivot table through pandas’ ‘.pivot_table’ function. It takes two arguments at its most basic:

  • The dataframe to pivot – df in this case
  • Index – the column that we will pivot around

Let’s pivot around HomeTeam and see what happens:

In [3]:
pd.pivot_table(df,index=["HomeTeam"])
Out[3]:
AC AF AR AS AST AY FTAG FTHG HC HF HR HS HST HTAG HTHG HY
HomeTeam
Arsenal 4.578947 11.157895 0.052632 10.473684 4.210526 2.157895 0.842105 2.052632 6.947368 10.526316 0.157895 17.315789 6.000000 0.473684 0.789474 1.789474
Bournemouth 4.736842 13.789474 0.105263 12.263158 3.947368 2.263158 1.526316 1.842105 5.157895 9.105263 0.052632 13.315789 4.789474 0.631579 0.789474 1.263158
Burnley 6.000000 11.842105 0.000000 15.473684 4.421053 1.789474 1.052632 1.368421 4.157895 10.947368 0.000000 11.052632 3.368421 0.315789 0.631579 1.842105
Chelsea 4.526316 13.157895 0.052632 8.105263 3.000000 2.105263 0.894737 2.894737 6.789474 9.736842 0.000000 16.631579 6.368421 0.526316 1.315789 1.684211
Crystal Palace 5.421053 12.578947 0.052632 11.894737 4.210526 1.894737 1.315789 1.263158 5.368421 12.105263 0.000000 11.789474 3.894737 0.789474 0.526316 1.684211
Everton 5.789474 11.368421 0.052632 10.210526 3.052632 1.736842 0.842105 2.210526 6.315789 11.315789 0.052632 14.578947 5.894737 0.315789 0.894737 1.894737
Hull 6.052632 11.052632 0.052632 16.000000 5.894737 1.789474 1.842105 1.473684 3.842105 11.526316 0.105263 10.947368 3.631579 0.631579 0.421053 1.684211
Leicester 5.210526 9.315789 0.000000 13.526316 4.789474 1.421053 1.315789 1.631579 5.789474 10.578947 0.000000 12.631579 4.368421 0.421053 0.684211 1.578947
Liverpool 3.210526 10.210526 0.052632 7.421053 2.368421 2.000000 0.947368 2.368421 6.105263 10.210526 0.000000 17.421053 6.684211 0.473684 1.263158 1.157895
Man City 3.368421 11.105263 0.052632 6.526316 2.315789 2.473684 0.894737 1.947368 7.947368 9.894737 0.210526 17.684211 6.052632 0.157895 0.789474 1.684211
Man United 3.368421 12.684211 0.105263 8.526316 2.631579 1.947368 0.631579 1.368421 7.315789 13.105263 0.052632 17.736842 6.210526 0.315789 0.789474 1.894737
Middlesbrough 5.000000 10.789474 0.000000 12.368421 3.526316 1.526316 1.210526 0.894737 4.894737 11.052632 0.000000 10.789474 3.105263 0.578947 0.473684 2.000000
Southampton 4.736842 11.368421 0.052632 10.789474 2.789474 1.789474 1.105263 0.894737 5.684211 10.842105 0.105263 15.736842 4.842105 0.368421 0.368421 1.210526
Stoke 6.368421 11.947368 0.052632 13.157895 4.421053 2.000000 1.263158 1.263158 5.421053 11.684211 0.105263 12.052632 4.157895 0.421053 0.789474 1.842105
Sunderland 5.052632 10.684211 0.052632 15.052632 6.105263 1.473684 1.789474 0.842105 4.526316 11.631579 0.105263 11.263158 3.473684 0.842105 0.263158 1.842105
Swansea 4.421053 12.421053 0.000000 12.684211 4.052632 1.631579 1.789474 1.421053 6.263158 10.000000 0.000000 11.736842 4.368421 0.631579 0.526316 1.578947
Tottenham 2.947368 13.368421 0.105263 7.894737 2.368421 2.578947 0.473684 2.473684 7.789474 11.000000 0.000000 19.631579 7.736842 0.157895 1.210526 1.421053
Watford 4.473684 12.473684 0.105263 12.578947 4.263158 1.842105 1.526316 1.315789 4.315789 14.368421 0.105263 12.157895 4.052632 0.947368 0.526316 2.631579
West Brom 5.578947 9.368421 0.052632 13.157895 4.105263 1.210526 1.157895 1.421053 4.210526 11.526316 0.000000 11.578947 3.789474 0.526316 0.789474 1.736842
West Ham 3.842105 12.473684 0.052632 11.578947 4.631579 2.105263 1.631579 1.000000 4.736842 10.105263 0.052632 14.210526 3.473684 0.684211 0.263158 2.473684

This pivot table gives us the mean for each team in each of their home games – for every variable. Useful, but difficult to read. Let’s choose a couple of columns to look at more closely. We do this by passing a list of columns into the values argument:

In [4]:
pd.pivot_table(df,index=["HomeTeam"], values=["FTHG","FTAG"])
Out[4]:
FTAG FTHG
HomeTeam
Arsenal 0.842105 2.052632
Bournemouth 1.526316 1.842105
Burnley 1.052632 1.368421
Chelsea 0.894737 2.894737
Crystal Palace 1.315789 1.263158
Everton 0.842105 2.210526
Hull 1.842105 1.473684
Leicester 1.315789 1.631579
Liverpool 0.947368 2.368421
Man City 0.894737 1.947368
Man United 0.631579 1.368421
Middlesbrough 1.210526 0.894737
Southampton 1.105263 0.894737
Stoke 1.263158 1.263158
Sunderland 1.789474 0.842105
Swansea 1.789474 1.421053
Tottenham 0.473684 2.473684
Watford 1.526316 1.315789
West Brom 1.157895 1.421053
West Ham 1.631579 1.000000

Much easier – we can now spot the teams scoring and conceding the most at home really really easily – great job!

By default, we get the average in our pivots – but we can pass an ‘aggfunc’ argument to change this. Pass one function to apply it to all columns, or a list of functions to return multiple calculations in your pivot table.

Here are some common functions that you might pass:

  • np.sum – Add the figures
  • len – Give a count of values
  • np.std – Find the standard deviation

Let’s look at a couple of examples:

In [5]:
pd.pivot_table(df,index=["HomeTeam"], values=["FTHG","FTAG"], aggfunc=np.sum)
Out[5]:
FTAG FTHG
HomeTeam
Arsenal 16 39
Bournemouth 29 35
Burnley 20 26
Chelsea 17 55
Crystal Palace 25 24
Everton 16 42
Hull 35 28
Leicester 25 31
Liverpool 18 45
Man City 17 37
Man United 12 26
Middlesbrough 23 17
Southampton 21 17
Stoke 24 24
Sunderland 34 16
Swansea 34 27
Tottenham 9 47
Watford 29 25
West Brom 22 27
West Ham 31 19
In [6]:
pd.pivot_table(df,index=["HomeTeam"], values=["FTHG","FTHG"], aggfunc=[np.sum,len])
Out[6]:
sum len
FTHG FTHG FTHG FTHG
HomeTeam
Arsenal 39 39 19 19
Bournemouth 35 35 19 19
Burnley 26 26 19 19
Chelsea 55 55 19 19
Crystal Palace 24 24 19 19
Everton 42 42 19 19
Hull 28 28 19 19
Leicester 31 31 19 19
Liverpool 45 45 19 19
Man City 37 37 19 19
Man United 26 26 19 19
Middlesbrough 17 17 19 19
Southampton 17 17 19 19
Stoke 24 24 19 19
Sunderland 16 16 19 19
Swansea 27 27 19 19
Tottenham 47 47 19 19
Watford 25 25 19 19
West Brom 27 27 19 19
West Ham 19 19 19 19

There we go, an intro to pivots in 5 minutes! For further information, read the docs.

Give it a go, and if you’re looking to add more skills to your data analysis toolbox, check out our crash course!