2

I have these two dataframes: (update: I've added one column stuffto df1 to specify that the two dataframes have not the same schemas)

df1 = pd.DataFrame({'id': ['1','2','3'], 
                    'val': [0, 0, 0], 
                    'stuff': ['foo', 'bar', 'spam']})
df2 = pd.DataFrame({'id': ['2','3'], 'val': [10, 20]})

print(df1)
  id  val stuff
0  1    0   foo
1  2    0   bar
2  3    0  spam

print(df2)
  id  val
0  2   10
1  3   20

I want to update the values in df1 val column with the values from df2 val column based on the id column. Desired result after transformation on df1:

print(df1)
  id  val stuff
0  1    0   foo
1  2   10   bar
2  3   20  spam

I could use a join (merge) but then I would need several more steps to end up with the expected result (casting the column from float to int, dropping column, etc.). (BTW, if you have a simple and elegant way to it with a join, I am also interested).
I am trying to use slicing methods but could not figure out how. Example:

>>> df1.loc[df1['id'].isin(df2['id']), 'val'] = df2['val']

gives:

print(df1)
  id   val stuff
0  1   0.0   foo
1  2  20.0   bar
2  3   NaN  spam

Update: One more constraint: do not modify the original df1 index.

3 Answers 3

1

You could also do map

In [88]: df1['id'].map(df2.set_index('id')['val']).fillna(df1['val'])
Out[88]:
0     0.0
1    10.0
2    20.0
Name: id, dtype: float64

In [89]: df1['val'] = df1['id'].map(df2.set_index('id')['val']).fillna(df1['val'])

In [90]: df1
Out[90]:
  id   val
0  1   0.0
1  2  10.0
2  3  20.0
Sign up to request clarification or add additional context in comments.

Comments

1

I found a solution using a merge:

df1 = df1.merge(df2, how='left', on='id')
df1['val'] = np.where(df1['val_y'].isnull(), df1['val_x'], df1['val_y'])
# recast to int
df1['val'] = df1['val'].astype(int)
# remove extra columns
df1.drop(['val_x', 'val_y'], axis=1, inplace=True)

print(df1)
  id stuff  val
0  1   foo    0
1  2   bar   10
2  3  spam   20

Comments

0

You can use concat and drop duplicates

df = pd.concat([df1,df2]).set_index('id').drop_duplicates().reset_index()

    id  val
0   1   0
1   2   10
2   3   20

If you don't want to modify original index then something like this will work.

df1 = pd.DataFrame({'id': ['1','2','3'], 'val': [0, 0, 0] ,
                       'stuff': ['foo', 'bar', 'spam']})
df2 = pd.DataFrame({'id': ['2','3'], 'val': [10, 20] })

df1.set_index('id', inplace=True)
df1.update(df2.set_index('id'))
df1.reset_index(inplace=True)
# df1.val = df1.val.astype(int) # convert to int

print(df1)
   id   val    stuff
0   1   0       foo
1   2   10      bar
2   3   20      spam

1 Comment

This works if df1 and df2´have the same schema. What if df1 have more columns? And I would prefer not to touch the original df1index. I've edited the question to take this into account.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.