Change values of a dataframe if value is null, from another dataframe pandas

Question

i have a dataframe df1:

id   age   val
10    2     10
20    3     NaN
30    4     55
40    5     NaN

I have another dataframe df2:

age  val_main
 1    111
 2    222
 3    333
 4    444
 5    555
 6    666

I want to only replace value from df1 where it is NaN with its corresponding value from df2.

Final output im looking for is:

id   age   val
10    2     10
20    3     333
30    4     55
40    5     555

i tried iterating the df1 by for loop and then locating the values from df2 where the row in df1 is null.

eg:

for index,row in df1.iterrows():
    if row['val'].isnull():
        df2.loc[df2.age==row.age].val
        .....
        .....

But i'm looking for a more robust and intelligent way of doing this.

jezrael · Accepted Answer · 2019-02-11 11:45:10Z

3

Use Series.map by dictionary with Series.fillna:

d = dict(zip(df2['age'], df2['val_main']))
#alternative with Series
#d = df2.set_index('age')['val_main']
df1['val'] = df1['val'].fillna(df1['age'].map(d))
print (df1)
   id  age    val
0  10    2   10.0
1  20    3  333.0
2  30    4   55.0
3  40    5  555.0

Performance depends of number of rows in both dataFrames and also by number of unique values and number of missing values:

np.random.seed(123)
N = 1000000

df2 = pd.DataFrame({'age': np.arange(1,101),
                    'val_main':np.random.randint(1000, size=100)})
print (df2)

df1 = pd.DataFrame({'id': np.random.randint(1000, size=N),
                   'age':np.random.randint(100, size=N),
                   'val':np.random.choice([10,20,30,np.nan], size=N)})
print (df1)

In [229]: %%timeit
     ...: df = df1.merge(df2, on = 'age', how='left')
     ...: df['val'] = df.val.fillna(df.pop('val_main'))
     ...: 
172 ms ± 3.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [230]: %%timeit d = dict(zip(df2['age'], df2['val_main']))
     ...: #alternative with Series
     ...: #d = df2.set_index('age')['val_main']
     ...: df1['val'] = df1['val'].fillna(df1['age'].map(d))
     ...: 
40 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Feb 11, 2019 at 11:45

answered Feb 11, 2019 at 11:04

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@anky_91 - I try find some timings

yatu · Accepted Answer · 2019-02-11 11:08:50Z

2

You could left .merge and .fillna the values in val with the merged column val_main:

df = df1.merge(df2, on = 'age', how='left')
df['val'] = df.val.fillna(df.pop('val_main'))

Output

   id  age    val
0  10    2   10.0
1  20    3  333.0
2  30    4   55.0
3  40    5  555.0

edited Feb 11, 2019 at 11:08

answered Feb 11, 2019 at 10:59

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Comments

Mohamed Thasin ah · Accepted Answer · 2019-02-11 11:05:48Z

1

I think you are looking for this,

df1=pd.merge(df1,df2,on=['age'],how='left')
df1['val']=df1['val'].fillna(df1['val_main'])

df1.drop('val_main',1,inplace=True) #to remove unnecessary column

print (df1)

O/P:

   id  age    val 
0  10    2   10.0 
1  20    3  333.0 
2  30    4   55.0 
3  40    5  555.0

edited Feb 11, 2019 at 11:05

answered Feb 11, 2019 at 10:59

Mohamed Thasin ah

11.2k11 gold badges65 silver badges120 bronze badges

Comments

iamklaus · Accepted Answer · 2019-02-11 11:18:55Z

0

i like pd.apply too

df1

   id  age   val
0  10    2  10.0
1  20    3   NaN
2  30    4  55.0
3  40    5   NaN

df2

   age  val_main
0    1       111
1    2       222
2    3       333
3    4       444
4    5       555
5    6       666

df1['val'] = df1[['age','val']].apply(lambda x: df2[df2['age'] == x.age].reset_index()['val_main'][0] if str(x.val) == 'nan' else x.val, axis=1 )

Output

   id  age    val
0  10    2   10.0
1  20    3  333.0
2  30    4   55.0
3  40    5  555.0

answered Feb 11, 2019 at 11:18

iamklaus

3,7682 gold badges14 silver badges21 bronze badges

Collectives™ on Stack Overflow

Change values of a dataframe if value is null, from another dataframe pandas

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related