Hoping someone may be able to point me in the right direction as I am new to Python.
I am doing a small project to get to grips with data analysis in Python using some football data. I have two dataframes, one with player information and another with match information (match_df). The match_df has 22 columns with a player ID for each player in the match. I would like to swap the player_ID data in the match_df for the player's skill rating. I have written a function to look up a player and a date and return the rating (find_player_skill). I want to apply this to every relevant column in the dataframe but can't work out how to use the apply function because the arguments depend on the dataframe row. Therefore I think the easiest way is to use set_value on each element of the dataframe as below.
The problem is that I haven't managed to get this to execute (although I haven't tried running for hours on end). I assume there is a way to do the same thing in a reasonable time with different code or a souped up version. I have tried running the code on a small sample (3 rows) which was quick and then 1000 rows which didn't complete in 30 mins or so.
#change player ID's to skill data, currently runs very slowly
for i in range(len(match_df['match_date'])):
match_date = match_df['match_date'].iloc[i]
match_index = match_df.iloc[i].name
for pl_lab in ['h1','h2','h3','h4','h5','h6','h7','h8','h9','h10', 'h11',\
'a1','a2','a3','a4','a5','a6','a7','a8','a9','a10','a11']:
player_ID = match_df[pl_lab].iloc[i]
player_skill = find_player_skill(player_ID, match_date)
match_df.set_value(match_index,pl_lab,player_skill)
Any suggestions much appreciated.
EDIT: It is also worth saying that I thought about debugging the code and downloaded Pycharm for this but some of the earlier code that I wrote seemed to run very slowly (I wrote everything in iPython initially)
match_df.replace(df_player['theskillcolumns'].to_dict(), axis=1)where df_player is your df with skills as column and player ID as indexpd.DataFrame({'date_stat':['2015-10-16','2015-09-21','20115-09-21'],'overall_rating':[71.0,71.0,67.0]},index=[38255,38255,38256])The additional complexity which I forgot to mention in my original post is that each player may have more than one skill rating, hence the need to evaluate with a match date