In this tutorial, we’re going to play the role of a Chairman at a team that wants to reduce the wage bill, but bring in players perceived to be at the same level. Our team has a limited scouting network, so we’re going to make use of a database that comes with player rankings, wages and positions – taken from the FIFA database.

Of course, there is lot more that we would need to take into account when scouting a player. This is half-tongue in cheek tutorial, half-still better than lots of multi-million deals 👀

The aim of the tutorial is to learn about filtering information in larger datasets. We’ll filter it based on a text value that we will provide manually, and also on whether or not a value is greater or smaller than another one – wage bill, for example.

The data for this project can be found on Kaggle here. Let’s get our modules imported and save our team some money!

#Import modules
import numpy as np
import pandas as pd

We’ll open the dataset with pandas. We have it within a folder, so we navigate to the folder and name the csv. We’ll name this file ‘data’.

Calling .head() will allow us to check out the structure. The dataset is pretty rich and we can get loads of biographical data, in addition to ratings. Loads more that you can do with this beyond this demo!

#load data
data = pd.read_csv("fifa-20-complete-player-dataset/players_20.csv")
data.head()

The dataset has over 18,000 players, so we are going to select a team to save some money for. Let’s put ourselves in the shoes of a business-minded Chairman at Manchester United.

First up, let’s take stock of our own team. Here, we are going to filter based on a specific piece of text that we know – the club name.

We’ll create a function that filters the big database for a match with a team name. There are lots of columns to the main dataset, so we’re only going to extract the name, wage, value, positions, ranking and age of our players.

#Define function called club, that looks for a team name and return the selected columns
def club(teamName):
    
    return data[data['club'] == teamName][['short_name','wage_eur','value_eur','player_positions','overall','age']]


#Use the club function to find the team, and sort the squad by wage bill
club('Manchester United').sort_values("wage_eur", ascending = False)

Now we have all of our players, we need to find a way to take information from a player we may want to replace and use it to search the initial database.

Let’s start with an easy example. We’re going the Extract De Gea’s information from the database, assign his wage and rating to variables and finally recommend a player who is as good as him, but has a cheaper wage.

First step, get the DDG information and variables sorted:

#Extract DDG's information, just like we did with the team name before
DDG = data[data['short_name'] == 'De Gea'][['short_name','wage_eur','value_eur','player_positions','overall','age']]

#Assign DDG's wage, position, rating and age to variables
DDGWage = DDG['wage_eur'].item()
DDGPos = DDG['player_positions'].item()
DDGRating = DDG['overall'].item()
DDGAge = DDG['age'].item()

Secondly, we are going to filter the initial database by these values. To do this, we will create a longlist of players that match DDG’s position. From there, we need to slim the list down to meet other criteria. So we will create lists of players that do not meet requirements, then delete them from the longlist.

#Create a list of goalkeepers, matching DDG's position
longlist = data[data['player_positions'] == DDGPos][['short_name','wage_eur','value_eur','player_positions','overall','age']]

#Create a list of players that have a lower overall rank than DDG
removals = longlist[longlist['overall'] <= DDGRating].index
#Drop these players
longlist.drop(removals , inplace=True)

#Repeat above, but for players with a larger wage
removals = longlist[longlist['wage_eur'] > DDGWage].index
longlist.drop(removals , inplace=True)

#Repeat above, but for older players
removals = longlist[longlist['age'] >= DDGAge].index
longlist.drop(removals , inplace=True)

#Show me our potential replacements, sorted by lowest wages
longlist.sort_values("wage_eur")

And we have one player that meets the criteria – Jan Oblak!

He ticks all of the boxes we laid out, cheaper, better and younger. In fact, his weekly wage is €80,000 cheaper! 🤑🤑

Chairmen of the world, happy to audit your team for a bargain fee of 1% of potential savings! Alternatively, if you would like to do it yourselves, it would save some time to put the above into a function. Let’s give that a go.

We’ll do exactly what we did before, but we’re going to add a second argument to the function that allows us to look at players rated lower than our own. We’ll call this skillReduction and use it to change our filter criteria for the rating piece. It is set to 0 by default, in this case it will work just as it did above.

def cheapReplacement(player, skillReduction = 0):
    
    #Get the replacee with the name provided in the argument
    replacee = data[data['short_name'] == player][['short_name','wage_eur','value_eur','player_positions','overall','age']]

    #Assign the relevant details of this player to variables
    replaceePos = replacee['player_positions'].item()
    replaceeWage = replacee['wage_eur'].item()
    replaceeAge = replacee['age'].item()
    replaceeOverall = replacee['overall'].item() - skillReduction
    
    #Create the longlist of players that share the position
    longlist = data[data['player_positions'] == replaceePos][['short_name','wage_eur','value_eur','player_positions','overall','age']]
    
    
    #Create list of players that do not meet the rating criteria and drop them from the longlist
    removals = longlist[longlist['overall'] <= replaceeOverall].index
    longlist.drop(removals , inplace=True)
    
    #Repeat for players with higher wages
    removals = longlist[longlist['wage_eur'] > replaceeWage].index
    longlist.drop(removals , inplace=True)

    #Repeat for older players
    removals = longlist[longlist['age'] >= replaceeAge].index
    longlist.drop(removals , inplace=True)
    
    #Display the players that meet the requirements
    return longlist.sort_values("wage_eur")

And let’s give it a go for Fred

cheapReplacement('Fred')

Cha-ching, 46k-16k a week saved. Give me the job.

“FC Python, give us a cheaper centre back!”

cheapReplacement('V. Lindelöf')

And another 30k-4k. Crack open Youtube and get your eyes around ‘TAH – WELCOME TO UNITED – GOALS & MONSTER TACKLES 19/20’.

Last one, we’re going to check out Pogba’s possible replacements. And we’re going to assume that FIFA overrates him by 8 points, so we’ll accept players up to that level.

cheapReplacement('P. Pogba', 8)

Squad management/filtering dataframes 101 done, get your job applications out.

Developing this further, we could of course take more factors into account than the overall rating. Look at potential ability, key ability ratings, etc. for a better fit for the player.

You may also want to run the function that we made over a whole team to see potential savings. Then do it against loads of teams and compare squad strengths and potential savings. Would be interesting to see which teams are over and under paying for their squad!

	sofifa_id	player_url	short_name	long_name	age	dob	height_cm	weight_kg	nationality	club	…	lwb	ldm	cdm	rdm	rwb	lb	lcb	cb	rcb	rb
0	158023	https://sofifa.com/player/158023/lionel-messi/…	L. Messi	Lionel Andrés Messi Cuccittini	32	1987-06-24	170	72	Argentina	FC Barcelona	…	68+2	66+2	66+2	66+2	68+2	63+2	52+2	52+2	52+2	63+2
1	20801	https://sofifa.com/player/20801/c-ronaldo-dos-…	Cristiano Ronaldo	Cristiano Ronaldo dos Santos Aveiro	34	1985-02-05	187	83	Portugal	Juventus	…	65+3	61+3	61+3	61+3	65+3	61+3	53+3	53+3	53+3	61+3
2	190871	https://sofifa.com/player/190871/neymar-da-sil…	Neymar Jr	Neymar da Silva Santos Junior	27	1992-02-05	175	68	Brazil	Paris Saint-Germain	…	66+3	61+3	61+3	61+3	66+3	61+3	46+3	46+3	46+3	61+3
3	200389	https://sofifa.com/player/200389/jan-oblak/20/…	J. Oblak	Jan Oblak	26	1993-01-07	188	87	Slovenia	Atlético Madrid	…	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	183277	https://sofifa.com/player/183277/eden-hazard/2…	E. Hazard	Eden Hazard	28	1991-01-07	175	74	Belgium	Real Madrid	…	66+3	63+3	63+3	63+3	66+3	61+3	49+3	49+3	49+3	61+3

	short_name	wage_eur	value_eur	player_positions	overall	age
24	P. Pogba	250000	72500000	CM, CDM	88	26
14	De Gea	205000	56000000	GK	89	28
301	A. Sánchez	150000	21000000	LW, LM	82	30
163	A. Martial	140000	34500000	LW, ST, LM	83	23
295	Juan Mata	135000	19500000	CAM, RM	82	31
319	J. Lingard	135000	24000000	CAM, RW	82	26
171	M. Rashford	125000	35500000	ST, LM	83	21
280	H. Maguire	125000	23000000	CB	82	26
534	C. Smalling	115000	12500000	CB	80	29
396	N. Matić	115000	14500000	CDM, CM	81	30
644	Fred	110000	15000000	CM	79	26
462	E. Bailly	105000	16500000	CB	80	25
654	P. Jones	100000	12500000	CB	79	27
347	L. Shaw	97000	20000000	LB	81	23
335	V. Lindelöf	97000	22000000	CB	81	24
957	M. Rojo	95000	8500000	CB	78	29
508	S. Romero	85000	9000000	GK	80	32
1140	A. Young	85000	4000000	RB, LB	77	33
1044	Andreas Pereira	81000	11500000	CM, CAM, RM	77	23
1962	M. Darmian	79000	5500000	LB, RB	75	29
1028	S. McTominay	73000	11500000	CDM, CM	77	22
571	A. Wan-Bissaka	71000	17500000	RB	79	21
1631	Diogo Dalot	54000	10000000	RB, LB, RM	75	20
3299	D. James	46000	5500000	LM, LW, RM	72	21
2661	A. Tuanzebe	46000	6500000	CB, RB	73	21
2702	T. Fosu-Mensah	46000	5500000	RB, CB	73	21
3689	L. Grant	30000	525000	GK	72	36
8010	C. Borthwick-Jackson	23000	925000	LB	67	22
8839	D. Mitchell	19000	1000000	LB	66	22
9901	T. Chong	14000	1200000	RW, LW, RM	65	19
6622	A. Gomes	12000	1800000	CAM, CM, LW	68	18
7690	M. Greenwood	6000	1500000	CF, RM, ST	67	17
12167	J. Garner	5000	725000	CDM, CM, CB	63	18

	short_name	wage_eur	value_eur	player_positions	overall	age
330	H. Aouar	59000	26000000	CM	81	21
259	K. Demirbay	75000	27500000	CM	82	25
245	Rúben Neves	89000	29000000	CM	82	22

	short_name	wage_eur	value_eur	player_positions	overall	age
165	J. Tah	67000	31000000	CB	83	23
246	P. Kimpembe	75000	26500000	CB	82	23
73	M. de Ligt	76000	50000000	CB	85	19
170	M. Akanji	77000	31000000	CB	83	23
177	D. Sánchez	93000	30500000	CB	83	23

	short_name	wage_eur	value_eur	player_positions	overall	age
331	T. Ndombele	87000	26000000	CM, CDM	81	22
243	N. Keïta	95000	29000000	CM, CDM	82	24
166	C. Tolisso	110000	34000000	CM, CDM	83	24
338	E. Can	110000	23000000	CM, CDM	81	25
173	A. Rabiot	120000	33000000	CM, CDM	83	24
75	F. de Jong	195000	52000000	CM, CDM	85	22