Python Pandas update a dataframe value from another dataframe

Question

I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.

Here is an example with same data and code:

DataFrame 1 :

DataFrame 2:

I want to update update dataframe 1 based on matching code and name. In this example Dataframe 1 should be updated as below:

Note : Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)

import pandas as pd

data1 = {
         'Code': [1, 2, 3],
         'Name': ['Company1', 'Company2', 'Company3'],
         'Value': [200, 300, 400],

    }
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])

data2 = {
         'Code': [2],
         'Name': ['Company2'],
         'Value': [1000],
    }

df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])

Any pointers or hints?

Nic · Accepted Answer · 2018-04-19 19:37:34Z

86

Using DataFrame.update, which aligns on indices (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):

>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index()  # to recover the initial structure

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

edited Apr 19, 2018 at 19:37

answered Apr 19, 2018 at 19:17

Nic

3,5173 gold badges23 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Lokkii9 Over a year ago

This seems to be the most ideal solution among all... but Nic, can you help me with one thing?... what if df1 and df2 had 5 columns in each, but I wanted to update only the "Value" column, and not the rest of them(above code updates all columns pertaining to that "index")... is that possible pls? Kindly help ...

AXO Over a year ago

Why is Value column converted to float?

Domenico Spidy Tamburro Over a year ago

This was the solution I was looking for. You can also expand this to multiple lookup columns: df1.set_index(['Code', 'Name'], inplace=True) and updates multiple measure columns in case you have e.g. Value, Sales, etc.

David Dehghan · Accepted Answer · 2022-07-25 04:55:53Z

43

You can using concat + drop_duplicates which updates the common rows and adds the new rows in df2

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]: 
   Code      Name  Value
0     1  Company1    200
0     2  Company2   1000
2     3  Company3    400

Update due to below comments

df1.set_index(['Code', 'Name'], inplace=True)

df1.update(df2.set_index(['Code', 'Name']))

df1.reset_index(drop=True, inplace=True)

edited Jul 25, 2022 at 4:55

David Dehghan

25.3k11 gold badges113 silver badges101 bronze badges

answered Apr 19, 2018 at 19:25

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

mjspier Over a year ago

Just want to point out that this solution not only updates the entries frame dataframe1 but also adds new entries from dataframe2 which were not present in dataframe1 before.

anishtain4 Over a year ago

It also blows up the memory as it has to make a duplicate of both dataframes before dropping the duplicates.

Bubble Bubble Bubble Gut · Accepted Answer · 2018-04-19 19:11:36Z

15

You can merge the data first and then use numpy.where, here's how to use numpy.where

updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

answered Apr 19, 2018 at 19:11

Bubble Bubble Bubble Gut

3,37617 silver badges30 bronze badges

1 Comment

ProgSky Over a year ago

Thanks. So Left join and then update 'Value' field with 'Value_new' for non NaN rows.

safiqul islam · Accepted Answer · 2020-07-13 07:40:02Z

14

There is a update function available

example:

df1.update(df2)

for more info:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html

answered Jul 13, 2020 at 7:40

safiqul islam

6805 silver badges20 bronze badges

2 Comments

H.C.Chen Over a year ago

older same and better answer exists already

kappa101 Over a year ago

It would be required to set_index first to use update reliably.

jpp · Accepted Answer · 2018-04-19 19:50:09Z

10

You can align indices and then use combine_first:

res = df2.set_index(['Code', 'Name'])\
         .combine_first(df1.set_index(['Code', 'Name']))\
         .reset_index()

print(res)

#    Code      Name   Value
# 0     1  Company1   200.0
# 1     2  Company2  1000.0
# 2     3  Company3   400.0

answered Apr 19, 2018 at 19:50

jpp

166k37 gold badges301 silver badges362 bronze badges

2 Comments

Corina Roca Over a year ago

This is not a valid answer, cause: Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. pandas.pydata.org/pandas-docs/stable/reference/api/… @safiqul islam mentioned below the update function, which seeems to work. pandas.pydata.org/pandas-docs/stable/reference/api/…

Antony Hatchkins Over a year ago

@CorinaRosa Can you give a counter-example?

ALollz · Accepted Answer · 2018-04-19 19:21:48Z

4

Assuming company and code are redundant identifiers, you can also do

import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()

df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)

#   Code      Name  Value
#0     1  Company1    200
#1     2  Company2   1000
#2     3  Company3    400

answered Apr 19, 2018 at 19:21

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Comments

Caio Estrella · Accepted Answer · 2020-07-31 17:58:04Z

4

There's something I often do.

I merge 'left' first:

df_merged = pd.merge(df1, df2, how = 'left', on = 'Code')

Pandas will create columns with extension '_x' (for your left dataframe) and '_y' (for your right dataframe)

You want the ones that came from the right. So just remove any columns with '_x' and rename '_y':

for col in df_merged.columns:
    if '_x' in col:
        df_merged .drop(columns = col, inplace = True)
    if '_y' in col:
        new_name = col.strip('_y')
        df_merged .rename(columns = {col : new_name }, inplace=True)

answered Jul 31, 2020 at 17:58

Caio Estrella

512 bronze badges

Comments

Ami Tavory · Accepted Answer · 2018-04-19 19:34:19Z

3

You can use pd.Series.where on the result of left-joining df1 and df2

merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
    Code    Name    Value
0   1   Company1    200.0
1   2   Company2    1000.0
2   3   Company3    400.0

You can change the line to

df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)

in order to return the value to be an integer.

edited Apr 19, 2018 at 19:34

answered Apr 19, 2018 at 19:18

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

2 Comments

ProgSky Over a year ago

Why is it adding .0 to the value ? (Not a big deal, but just curious)

Ami Tavory Over a year ago

@ProgSky It is because the type changed. I updated the answer to show how to return it to int.

muTheTechie · Accepted Answer · 2020-04-24 23:17:56Z

2

Append the dataset
Drop the duplicate by code
Sort the values

combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')

answered Apr 24, 2020 at 23:17

muTheTechie

1,70319 silver badges26 bronze badges

Comments

arie64 · Accepted Answer · 2020-05-20 00:13:53Z

2

None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution

indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values

answered May 20, 2020 at 0:13

arie64

5826 silver badges10 bronze badges

Collectives™ on Stack Overflow

Python Pandas update a dataframe value from another dataframe

10 Answers 10

3 Comments

2 Comments

1 Comment

2 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

3 Comments

2 Comments

1 Comment

2 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related