python - find duplicates in a column, replace values in another column for that duplicate

Question

I have a dataframe that consists of of video game titles on various platforms. it contains, among other values the name, critic's average score and user's average score. Many of them are missing scores for the user, critic and/or ESRB rating.

What i'd like to do is replace the missing rating, critic and user scores with those for the same game on a different platform (assuming they exist) i'm not quite sure how to approach this.(note - i don't want to drop the duplicate names, because they aren't truly duplicate rows)

here is a sample chunk of the dataframe (i've removed some unrelated columns to make it manageable):

                                            name platform  critic_score  \    user_score rating  
0                                     Wii Sports      Wii          76.0   0          8.0      E  
1                              Super Mario Bros.      NES           NaN   1          NaN    NaN  
2                                 Mario Kart Wii      Wii          82.0   2          8.3      E  
3                              Wii Sports Resort      Wii          80.0   3          8.0      E  
4                       Pokemon Red/Pokemon Blue       GB           NaN   4          NaN    NaN  
5                                         Tetris       GB           NaN   5          NaN    NaN  
6                          New Super Mario Bros.       DS          89.0   6          8.5      E  
7                                       Wii Play      Wii          58.0   7          6.6      E  
8                      New Super Mario Bros. Wii      Wii          87.0   8          8.4      E  
9                                      Duck Hunt      NES           NaN   9          NaN    NaN  
10                                    Nintendogs       DS           NaN   10         NaN    NaN  
11                                 Mario Kart DS       DS          91.0   11         8.6      E  
12                   Pokemon Gold/Pokemon Silver       GB           NaN   12         NaN    NaN  
13                                       Wii Fit      Wii          80.0   13         7.7      E  
14                            Kinect Adventures!     X360          61.0   14         6.3      E  
15                                  Wii Fit Plus      Wii          80.0   15         7.4      E  
16                            Grand Theft Auto V      PS3          97.0   16         8.2      M  
17                 Grand Theft Auto: San Andreas      PS2          95.0   17         9.0      M  
18                             Super Mario World     SNES           NaN   18         NaN    NaN  
19  Brain Age: Train Your Brain in Minutes a Day       DS          77.0   19         7.9      E  
20                 Pokemon Diamond/Pokemon Pearl       DS           NaN   20         NaN    NaN  
21                              Super Mario Land       GB           NaN   21         NaN    NaN  
22                           Super Mario Bros. 3      NES           NaN   22         NaN    NaN  
23                            Grand Theft Auto V     X360          97.0   23         8.1      M  
24                   Grand Theft Auto: Vice City      PS2          95.0   24         8.7      M  
25                 Pokemon Ruby/Pokemon Sapphire      GBA           NaN   25         NaN    NaN  
26   Brain Age 2: More Training in Minutes a Day       DS          77.0   26         7.1      E  
27                   Pokemon Black/Pokemon White       DS           NaN   27         NaN    NaN  
28                        Gran Turismo 3: A-Spec      PS2          95.0   28         8.4      E  
29                Call of Duty: Modern Warfare 3     X360          88.0   29         3.4      M

now, there don't happen to be any duplicates that stick out in this head 30 lines, but for instance i have 007: Quantum of Solace on the PS3, Wii, DS, PC and x360. between all of the platforms i have a mean rating for both users and critics, as well as a rating.

as requested - here is a sample of some duplicated values:

index   name    platform    critic_Score    user_score  rating
3862    Frozen: Olaf's Quest    DS  NaN NaN NaN
3358    Frozen: Olaf's Quest    3DS NaN NaN NaN
1785    007: Quantum of Solace  PS3 65  6.6 T
3120    007: Quantum of Solace  Wii 54  7.5 T
9507    007: Quantum of Solace  DS  65  NaN T
4475    007: Quantum of Solace  PS2 NaN NaN NaN
1285    007: Quantum of Solace  X360    65  7.1 T
14658   007: Quantum of Solace  PC  70  6.3 T
2243    007: The World is not Enough    PS  61  6.7 T
1204    007: The World is not Enough    N64 NaN NaN NaN

i've separated my duplicates into their own dataframe (df1 is my original games dataframe, df2 is the duplicates dataframe):

df2 = df[df.duplicated(['name'],False)]
df2 = df2.sort_values(['name'])

so i can see my duplicates and their values, but of course i don't wanna fill in 8500 missing values from duplicates by hand.

I can find the duplicated names, but i don't know how to fill the NaN values with the "good" values from the other platform?

i'm at a loss for how to begin this and would appreciate any input into a direction.

now - to add another step to it - in my example above of the 007 game - the critic and user scores aren't the same across platforms (the ps3 game got a 65, the wii game got a 54 and PC a 70) calculating the mean of the 3 should be the ideal solution, but i'll settle for ANY of the platforms if that is too complex (as you might have guessed, i am very new to python)

I appreciate any time and effort you have to share on my behalf.

Regards,

Jared

Would it make sense to separate out the games by platform by if they have NaN values or not, and then do a query for the name of the game to the other platform(s) dataframes and a full based on that somehow? — seuadr
– seuadr, Commented Jan 14, 2021 at 1:10
now, there don't happen to be any duplicates that stick out in this head 30 lines, but for instance i have 007: Quantum of Solace on the PS3, Wii, DS, PC and x360. Then why not also put the 007: Quantum of Solace to the sample data, that would be more helpful to the answers. — Ferris
– Ferris, Commented Jan 14, 2021 at 2:12

Pablo C · Accepted Answer · 2021-01-14 02:26:56Z

1

I'm pretty sure pandas.DataFrame.groupby is what you need:

df.groupby("name").mean()

If you want to join these results with you dataframe, you can use:

df.join(df.groupby("name").mean(), on = "name", rsuffix = "_mean")?

edited Jan 14, 2021 at 2:26

answered Jan 14, 2021 at 1:43

Pablo C

4,7612 gold badges10 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

seuadr Over a year ago

Pablo C, thanks for sharing. I am not sure i'm following you - how will the mean of "name" give me the mean score in a different column? i still end up with the NaN values, even though there are duplicates with non NaN values: index na_sales eu_sales jp_sales other_sales critic_score user_score total_sales name Frozen: Olaf's Quest 3610.000000 0.240 0.265000 0.000000 0.045000 NaN NaN 0.550000

seuadr Over a year ago

actually, i think i see what you are saying - the couple that i looked at initially have ALL NaN values across platforms and i jumped to conclusions. i'm not sure how to then backfill that mean into the NaN value - maybe an if statement?

Pablo C Over a year ago

@seuadr df.join(df.groupby("name").mean(), on = "name", rsuffix = "_mean")?

seuadr Over a year ago

Pablo C, thanks for your continued help - you've been a huge help. I now have the mean values i need in the right dataframe, and just need to fill in the NaN values in the original column with the mean value in the same row. i was way overthinking this - I was thinking about separating all of the platforms into different dataframes and doing some filtering and this is way more straight foward. thanks!

seuadr Over a year ago

PERFECT. THANK YOU!

|

Collectives™ on Stack Overflow

python - find duplicates in a column, replace values in another column for that duplicate

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related