2

Hoping someone may be able to point me in the right direction as I am new to Python.

I am doing a small project to get to grips with data analysis in Python using some football data. I have two dataframes, one with player information and another with match information (match_df). The match_df has 22 columns with a player ID for each player in the match. I would like to swap the player_ID data in the match_df for the player's skill rating. I have written a function to look up a player and a date and return the rating (find_player_skill). I want to apply this to every relevant column in the dataframe but can't work out how to use the apply function because the arguments depend on the dataframe row. Therefore I think the easiest way is to use set_value on each element of the dataframe as below.

The problem is that I haven't managed to get this to execute (although I haven't tried running for hours on end). I assume there is a way to do the same thing in a reasonable time with different code or a souped up version. I have tried running the code on a small sample (3 rows) which was quick and then 1000 rows which didn't complete in 30 mins or so.

#change player ID's to skill data, currently runs very slowly
for i in range(len(match_df['match_date'])):
    match_date = match_df['match_date'].iloc[i]
    match_index = match_df.iloc[i].name
    for pl_lab in ['h1','h2','h3','h4','h5','h6','h7','h8','h9','h10', 'h11',\
                   'a1','a2','a3','a4','a5','a6','a7','a8','a9','a10','a11']:
        player_ID = match_df[pl_lab].iloc[i]
        player_skill = find_player_skill(player_ID, match_date)
        match_df.set_value(match_index,pl_lab,player_skill)

Any suggestions much appreciated.

EDIT: It is also worth saying that I thought about debugging the code and downloaded Pycharm for this but some of the earlier code that I wrote seemed to run very slowly (I wrote everything in iPython initially)

2
  • I don't have access to your df, you could post 2 quick code line so we have an example of your df to play with. but looking at your problem, I think this would be acheivable with match_df.replace(df_player['theskillcolumns'].to_dict(), axis=1) where df_player is your df with skills as column and player ID as index Commented Oct 3, 2016 at 14:11
  • Try this for an example of the player_df pd.DataFrame({'date_stat':['2015-10-16','2015-09-21','20115-‌​09-21'],'overall_rat‌​ing':[71.0,71.0,67.0‌​]},index=[38255,3825‌​5,38256]) The additional complexity which I forgot to mention in my original post is that each player may have more than one skill rating, hence the need to evaluate with a match date Commented Oct 4, 2016 at 20:22

1 Answer 1

1

here is a manipulation you can do, assuming df is the dataframe of match where columns 0 to 2 is the player ID:

df = pd.DataFrame([['c' , 'a', 'b'], ['b', 'c', 'a']])
 df
Out[70]: 
   0  1  2
0  c  a  b
1  b  c  a

df_player = pd.DataFrame([['a', 100], ['b', 230], ['c', 200]],columns=['ID', 'skill']).set_index('ID')

    skill
ID       
a     100
b     230
c     200


dic = df_player.to_dict()['skill']

df.apply(lambda x: [dic[n] if n in dic.keys() else n for n in x], axis=1)
Out[69]: 
     0    1    2
0  200  100  230
1  230  200  100
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.