In this tutorial, we’re going to play the role of a Chairman at a team that wants to reduce the wage bill, but bring in players perceived to be at the same level. Our team has a limited scouting network, so we’re going to make use of a database that comes with player rankings, wages and positions – taken from the FIFA database.
Of course, there is lot more that we would need to take into account when scouting a player. This is half-tongue in cheek tutorial, half-still better than lots of multi-million deals 👀
The aim of the tutorial is to learn about filtering information in larger datasets. We’ll filter it based on a text value that we will provide manually, and also on whether or not a value is greater or smaller than another one – wage bill, for example.
The data for this project can be found on Kaggle here. Let’s get our modules imported and save our team some money!
#Import modules import numpy as np import pandas as pd
We’ll open the dataset with pandas. We have it within a folder, so we navigate to the folder and name the csv. We’ll name this file ‘data’.
Calling .head() will allow us to check out the structure. The dataset is pretty rich and we can get loads of biographical data, in addition to ratings. Loads more that you can do with this beyond this demo!
#load data data = pd.read_csv("fifa-20-complete-player-dataset/players_20.csv") data.head()
|0||158023||https://sofifa.com/player/158023/lionel-messi/…||L. Messi||Lionel Andrés Messi Cuccittini||32||1987-06-24||170||72||Argentina||FC Barcelona||…||68+2||66+2||66+2||66+2||68+2||63+2||52+2||52+2||52+2||63+2|
|1||20801||https://sofifa.com/player/20801/c-ronaldo-dos-…||Cristiano Ronaldo||Cristiano Ronaldo dos Santos Aveiro||34||1985-02-05||187||83||Portugal||Juventus||…||65+3||61+3||61+3||61+3||65+3||61+3||53+3||53+3||53+3||61+3|
|2||190871||https://sofifa.com/player/190871/neymar-da-sil…||Neymar Jr||Neymar da Silva Santos Junior||27||1992-02-05||175||68||Brazil||Paris Saint-Germain||…||66+3||61+3||61+3||61+3||66+3||61+3||46+3||46+3||46+3||61+3|
|3||200389||https://sofifa.com/player/200389/jan-oblak/20/…||J. Oblak||Jan Oblak||26||1993-01-07||188||87||Slovenia||Atlético Madrid||…||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN|
|4||183277||https://sofifa.com/player/183277/eden-hazard/2…||E. Hazard||Eden Hazard||28||1991-01-07||175||74||Belgium||Real Madrid||…||66+3||63+3||63+3||63+3||66+3||61+3||49+3||49+3||49+3||61+3|
5 rows × 104 columns
The dataset has over 18,000 players, so we are going to select a team to save some money for. Let’s put ourselves in the shoes of a business-minded Chairman at Manchester United.
First up, let’s take stock of our own team. Here, we are going to filter based on a specific piece of text that we know – the club name.
We’ll create a function that filters the big database for a match with a team name. There are lots of columns to the main dataset, so we’re only going to extract the name, wage, value, positions, ranking and age of our players.
#Define function called club, that looks for a team name and return the selected columns def club(teamName): return data[data['club'] == teamName][['short_name','wage_eur','value_eur','player_positions','overall','age']] #Use the club function to find the team, and sort the squad by wage bill club('Manchester United').sort_values("wage_eur", ascending = False)
|24||P. Pogba||250000||72500000||CM, CDM||88||26|
|301||A. Sánchez||150000||21000000||LW, LM||82||30|
|163||A. Martial||140000||34500000||LW, ST, LM||83||23|
|295||Juan Mata||135000||19500000||CAM, RM||82||31|
|319||J. Lingard||135000||24000000||CAM, RW||82||26|
|171||M. Rashford||125000||35500000||ST, LM||83||21|
|396||N. Matić||115000||14500000||CDM, CM||81||30|
|1140||A. Young||85000||4000000||RB, LB||77||33|
|1044||Andreas Pereira||81000||11500000||CM, CAM, RM||77||23|
|1962||M. Darmian||79000||5500000||LB, RB||75||29|
|1028||S. McTominay||73000||11500000||CDM, CM||77||22|
|1631||Diogo Dalot||54000||10000000||RB, LB, RM||75||20|
|3299||D. James||46000||5500000||LM, LW, RM||72||21|
|2661||A. Tuanzebe||46000||6500000||CB, RB||73||21|
|2702||T. Fosu-Mensah||46000||5500000||RB, CB||73||21|
|9901||T. Chong||14000||1200000||RW, LW, RM||65||19|
|6622||A. Gomes||12000||1800000||CAM, CM, LW||68||18|
|7690||M. Greenwood||6000||1500000||CF, RM, ST||67||17|
|12167||J. Garner||5000||725000||CDM, CM, CB||63||18|
Now we have all of our players, we need to find a way to take information from a player we may want to replace and use it to search the initial database.
Let’s start with an easy example. We’re going the Extract De Gea’s information from the database, assign his wage and rating to variables and finally recommend a player who is as good as him, but has a cheaper wage.
First step, get the DDG information and variables sorted:
#Extract DDG's information, just like we did with the team name before DDG = data[data['short_name'] == 'De Gea'][['short_name','wage_eur','value_eur','player_positions','overall','age']] #Assign DDG's wage, position, rating and age to variables DDGWage = DDG['wage_eur'].item() DDGPos = DDG['player_positions'].item() DDGRating = DDG['overall'].item() DDGAge = DDG['age'].item()
Secondly, we are going to filter the initial database by these values. To do this, we will create a longlist of players that match DDG’s position. From there, we need to slim the list down to meet other criteria. So we will create lists of players that do not meet requirements, then delete them from the longlist.
#Create a list of goalkeepers, matching DDG's position longlist = data[data['player_positions'] == DDGPos][['short_name','wage_eur','value_eur','player_positions','overall','age']] #Create a list of players that have a lower overall rank than DDG removals = longlist[longlist['overall'] <= DDGRating].index #Drop these players longlist.drop(removals , inplace=True) #Repeat above, but for players with a larger wage removals = longlist[longlist['wage_eur'] > DDGWage].index longlist.drop(removals , inplace=True) #Repeat above, but for older players removals = longlist[longlist['age'] >= DDGAge].index longlist.drop(removals , inplace=True) #Show me our potential replacements, sorted by lowest wages longlist.sort_values("wage_eur")
And we have one player that meets the criteria – Jan Oblak!
He ticks all of the boxes we laid out, cheaper, better and younger. In fact, his weekly wage is €80,000 cheaper! 🤑🤑
Chairmen of the world, happy to audit your team for a bargain fee of 1% of potential savings! Alternatively, if you would like to do it yourselves, it would save some time to put the above into a function. Let’s give that a go.
We’ll do exactly what we did before, but we’re going to add a second argument to the function that allows us to look at players rated lower than our own. We’ll call this skillReduction and use it to change our filter criteria for the rating piece. It is set to 0 by default, in this case it will work just as it did above.
def cheapReplacement(player, skillReduction = 0): #Get the replacee with the name provided in the argument replacee = data[data['short_name'] == player][['short_name','wage_eur','value_eur','player_positions','overall','age']] #Assign the relevant details of this player to variables replaceePos = replacee['player_positions'].item() replaceeWage = replacee['wage_eur'].item() replaceeAge = replacee['age'].item() replaceeOverall = replacee['overall'].item() - skillReduction #Create the longlist of players that share the position longlist = data[data['player_positions'] == replaceePos][['short_name','wage_eur','value_eur','player_positions','overall','age']] #Create list of players that do not meet the rating criteria and drop them from the longlist removals = longlist[longlist['overall'] <= replaceeOverall].index longlist.drop(removals , inplace=True) #Repeat for players with higher wages removals = longlist[longlist['wage_eur'] > replaceeWage].index longlist.drop(removals , inplace=True) #Repeat for older players removals = longlist[longlist['age'] >= replaceeAge].index longlist.drop(removals , inplace=True) #Display the players that meet the requirements return longlist.sort_values("wage_eur")
And let’s give it a go for Fred
Cha-ching, 46k-16k a week saved. Give me the job.
“FC Python, give us a cheaper centre back!”
|73||M. de Ligt||76000||50000000||CB||85||19|
And another 30k-4k. Crack open Youtube and get your eyes around ‘TAH – WELCOME TO UNITED – GOALS & MONSTER TACKLES 19/20’.
Last one, we’re going to check out Pogba’s possible replacements. And we’re going to assume that FIFA overrates him by 8 points, so we’ll accept players up to that level.
cheapReplacement('P. Pogba', 8)
|331||T. Ndombele||87000||26000000||CM, CDM||81||22|
|243||N. Keïta||95000||29000000||CM, CDM||82||24|
|166||C. Tolisso||110000||34000000||CM, CDM||83||24|
|338||E. Can||110000||23000000||CM, CDM||81||25|
|173||A. Rabiot||120000||33000000||CM, CDM||83||24|
|75||F. de Jong||195000||52000000||CM, CDM||85||22|
Squad management/filtering dataframes 101 done, get your job applications out.
Developing this further, we could of course take more factors into account than the overall rating. Look at potential ability, key ability ratings, etc. for a better fit for the player.
You may also want to run the function that we made over a whole team to see potential savings. Then do it against loads of teams and compare squad strengths and potential savings. Would be interesting to see which teams are over and under paying for their squad!