Python Pandas: update dataframe values from another dataframe

Question

I have these two dataframes: (update: I've added one column stuffto df1 to specify that the two dataframes have not the same schemas)

df1 = pd.DataFrame({'id': ['1','2','3'], 
                    'val': [0, 0, 0], 
                    'stuff': ['foo', 'bar', 'spam']})
df2 = pd.DataFrame({'id': ['2','3'], 'val': [10, 20]})

print(df1)
  id  val stuff
0  1    0   foo
1  2    0   bar
2  3    0  spam

print(df2)
  id  val
0  2   10
1  3   20

I want to update the values in df1 val column with the values from df2 val column based on the id column. Desired result after transformation on df1:

print(df1)
  id  val stuff
0  1    0   foo
1  2   10   bar
2  3   20  spam

I could use a join (merge) but then I would need several more steps to end up with the expected result (casting the column from float to int, dropping column, etc.). (BTW, if you have a simple and elegant way to it with a join, I am also interested).
I am trying to use slicing methods but could not figure out how. Example:

>>> df1.loc[df1['id'].isin(df2['id']), 'val'] = df2['val']

gives:

print(df1)
  id   val stuff
0  1   0.0   foo
1  2  20.0   bar
2  3   NaN  spam

Update: One more constraint: do not modify the original df1 index.

Zero · Accepted Answer · 2018-08-27 10:54:25Z

1

You could also do map

In [88]: df1['id'].map(df2.set_index('id')['val']).fillna(df1['val'])
Out[88]:
0     0.0
1    10.0
2    20.0
Name: id, dtype: float64

In [89]: df1['val'] = df1['id'].map(df2.set_index('id')['val']).fillna(df1['val'])

In [90]: df1
Out[90]:
  id   val
0  1   0.0
1  2  10.0
2  3  20.0

answered Aug 27, 2018 at 10:54

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ivankeller · Accepted Answer · 2018-08-27 11:44:31Z

1

I found a solution using a merge:

df1 = df1.merge(df2, how='left', on='id')
df1['val'] = np.where(df1['val_y'].isnull(), df1['val_x'], df1['val_y'])
# recast to int
df1['val'] = df1['val'].astype(int)
# remove extra columns
df1.drop(['val_x', 'val_y'], axis=1, inplace=True)

print(df1)
  id stuff  val
0  1   foo    0
1  2   bar   10
2  3  spam   20

answered Aug 27, 2018 at 11:44

ivankeller

2,0731 gold badge20 silver badges20 bronze badges

Comments

Abhi · Accepted Answer · 2018-09-24 21:28:23Z

0

You can use concat and drop duplicates

df = pd.concat([df1,df2]).set_index('id').drop_duplicates().reset_index()

    id  val
0   1   0
1   2   10
2   3   20

If you don't want to modify original index then something like this will work.

df1 = pd.DataFrame({'id': ['1','2','3'], 'val': [0, 0, 0] ,
                       'stuff': ['foo', 'bar', 'spam']})
df2 = pd.DataFrame({'id': ['2','3'], 'val': [10, 20] })

df1.set_index('id', inplace=True)
df1.update(df2.set_index('id'))
df1.reset_index(inplace=True)
# df1.val = df1.val.astype(int) # convert to int

print(df1)
   id   val    stuff
0   1   0       foo
1   2   10      bar
2   3   20      spam

edited Sep 24, 2018 at 21:28

answered Aug 27, 2018 at 10:53

Abhi

4,2431 gold badge18 silver badges29 bronze badges

1 Comment

ivankeller Over a year ago

This works if df1 and df2´have the same schema. What if df1 have more columns? And I would prefer not to touch the original df1index. I've edited the question to take this into account.

Collectives™ on Stack Overflow

Python Pandas: update dataframe values from another dataframe

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related