Fill empty values in a dataframe based on columns in another dataframe

Question

I have a dataframe df1 like this.

I want to fill the nan and the number 0 in column score with mutiple values in another dataframe df2 according to the different names.

How could I do this?

Thanks for accepting! Remember you can also upvote answers, so please consider upvoting the other answers. — cs95
– cs95, Commented Aug 25, 2017 at 15:07

piRSquared · Accepted Answer · 2017-08-25 15:15:13Z

4

Option 1
Short version

df1.score = df1.score.mask(df1.score.eq(0)).fillna(
    df1.name.map(df2.set_index('name').score)
)
df1

  name  score
0    A   10.0
1    B   32.0
2    A   10.0
3    C   30.0
4    B   20.0
5    A   45.0
6    A   10.0
7    A   10.0

Option 2
Interesting version using searchsorted. df2 must be sorted by 'name'.

i = np.where(np.isnan(df1.score.mask(df1.score.values == 0).values))[0]
j = df2.name.values.searchsorted(df1.name.values[i])
df1.score.values[i] = df2.score.values[j]
df1

  name  score
0    A   10.0
1    B   32.0
2    A   10.0
3    C   30.0
4    B   20.0
5    A   45.0
6    A   10.0
7    A   10.0

edited Aug 25, 2017 at 15:15

answered Aug 25, 2017 at 15:00

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

BENY Over a year ago

1st time notice fillna can be in this way , thank you :)+1

cs95 · Accepted Answer · 2017-08-25 14:53:45Z

2

If df1 and df2 are your dataframes, you can create a mapping and then call pd.Series.replace:

df1 = pd.DataFrame({'name' : ['A', 'B', 'A', 'C', 'B', 'A', 'A', 'A'], 
                     'score': [0, 32, 0, np.nan, np.nan, 45, np.nan, np.nan]})
df2 = pd.DataFrame({'name' : ['A', 'B', 'C'], 'score' : [10, 20, 30]})

print(df1)

  name  score
0    A    0.0
1    B   32.0
2    A    0.0
3    C    NaN
4    B    NaN
5    A   45.0
6    A    NaN
7    A    NaN

print(df2) 

  name  score
0    A     10
1    B     20
2    C     30

mapping = dict(df2.values)

df1.loc[(df1.score.isnull()) | (df1.score == 0), 'score'] =\
               df1[(df1.score.isnull()) | (df1.score == 0)].name.replace(mapping)

print(df1)

  name  score
0    A   10.0
1    B   32.0
2    A   10.0
3    C   30.0
4    B   20.0
5    A   45.0
6    A   10.0
7    A   10.0

edited Aug 25, 2017 at 14:53

answered Aug 25, 2017 at 14:37

cs95

406k106 gold badges744 silver badges797 bronze badges

2 Comments

piRSquared Over a year ago

dude! dict(df2.values) is pretty slick. I'll be stealing... borrowing that.

cs95 Over a year ago

@piRSquared By all means!

BENY · Accepted Answer · 2017-08-25 14:51:06Z

1

Or using merge, fillna

import pandas as pd
import numpy as np

df1.loc[df.score==0,'score']=np.nan
df1.merge(df2,on='name',how='left').fillna(method='bfill',axis=1)[['name','score_x']]\
    .rename(columns={'score_x':'score'})

edited Aug 25, 2017 at 14:51

answered Aug 25, 2017 at 14:44

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Alexander · Accepted Answer · 2017-08-25 15:10:13Z

1

This method changes the order (the result will be sorted by name).

df1.set_index('name').replace(0, np.nan).combine_first(df2.set_index('name')).reset_index()

  name  score
0    A     10
1    A     10
2    A     45
3    A     10
4    A     10
5    B     32
6    B     20
7    C     30

answered Aug 25, 2017 at 15:10

Alexander

111k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

Fill empty values in a dataframe based on columns in another dataframe

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related