With DataFrames giving us the opportunity to store huge grids of data, we will sometimes want to group particular parts of our data set for analysis. Pandas’ ‘groupby’ method gives us an easy way to do so.
Let’s put together a DataFrame of a team’s recent results with a dictionary:
import pandas as pd
#data is in a dictionary, each entry will be a column
#The first part of the entry is the column name, the second the values
data = {'Opponent':
["Atletico Jave","Newtown FC",
"Bunton Town", "Fentborough Dynamo"],
'Location':
["Home","Away","Away","Home"],
'GoalsFor':
[2,4,3,0],
'GoalsAgainst':
[4,0,2,2]}
Matches = pd.DataFrame(data)
Matches
An obvious way to group this data is by home and away matches.
Let’s use the ‘.groupby()’ method to do so. We just have to provide the column that we want to group by. In this case, location.
We’ll assign that to a variable, then call ‘.mean()’ to find the average.
HAMatches = Matches.groupby('Location')
HAMatches.mean()
Or cut out the variable and chain the ‘.mean()’ onto the end. Or chain another method:
Matches.groupby('Location').mean()
#Describes the dataset for each variable within - this is awesome!
Matches.groupby('Location').describe()
#Let's step up the chaining...
#'Groupby' location, then describe it to me...
#Then 'transpose' it (flip it onto its side)...
#Finally, just give me 'Away' data
Matches.groupby('Location').describe().transpose()['Away']
print("All that work done in just " +
str(len("Matches.groupby('Location').describe().transpose()['Away']"))
+ " characters!")
Summary
It is staggering how easily we can not only group data, but to also use Pandas to get some insight into our data. Really good job following this far.
We learned how to use ‘groupby()’ to group by location – home or away. We then used methods to describe our data, find averages and even to change the shape of our data frames. Really impressive stuff!
Next up, you might want to take a look at how we can join dataFrames together, how to deal with missing values or how to use even more operations.