Pandas replace dataframe values based on criteria

Question

I have a master dataframe, df:

Colour Item   Price
Blue   Car     40
Red   Car     30
Green  Truck   50
Green  Bike    30

I then have a price correction dataframe, df_pc:

Colour Item   Price
Red   Car     60
Green  Bike    70

I want to say if there is a match on Colour and Item in the price correction dataframe, then replace the price in the master df. so the expected output is;

Colour Item   Price
Blue   Car     60
Red   Car     30
Green  Truck   50
Green  Bike    70

I can't find a way of doing this currently

First, think about what should happen, if there's more than one match. Assuming there's not, you can merge both dataframes on the key Colour/Item and then fill the blanks in the merged dataframe column with values from the first dataframe. Please post a working code example generating your dataframes if you need exact code help. — 576i
– 576i, Commented Jan 13, 2020 at 14:26
ok thanks, yes no duplicates. would you say thats the best way of doing it? — fred.schwartz
– fred.schwartz, Commented Jan 13, 2020 at 14:27
there's df.drop_duplicates(keep='first', subset=['Colour', 'Item']) with parameters on what to do — 576i
– 576i, Commented Jan 13, 2020 at 14:28

jezrael · Accepted Answer · 2020-01-13 14:36:44Z

3

Use Index.isin for filter out no matched rows and then DataFrame.combine_first:

df = df.set_index(['Colour','Item'])
df_pc = df_pc.set_index(['Colour','Item'])

df_pc = df_pc[df_pc.index.isin(df.index)]
df = df_pc.combine_first(df).reset_index()
print (df)
  Colour   Item  Price
0   Blue    Car   40.0
1  Green   Bike   70.0
2  Green  Truck   50.0
3    Red    Car   60.0

Another data test:

print (df_pc)
   Colour  Item  Price
0     Red   Car     60
1  Orange  Bike     70 <- not matched row

df = df.set_index(['Colour','Item'])
df_pc = df_pc.set_index(['Colour','Item'])
df_pc = df_pc[df_pc.index.isin(df.index)]
df = df_pc.combine_first(df).reset_index()
print (df)
  Colour   Item  Price
0   Blue    Car   40.0
1  Green   Bike   30.0
2  Green  Truck   50.0
3    Red    Car   60.0

edited Jan 13, 2020 at 14:36

answered Jan 13, 2020 at 14:31

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

fred.schwartz Over a year ago

thanks @jezrael. this is great. May I just check, if there is an unnecessary row in the df_pc, that is then just ignored (and not added to master in anyway)?

fred.schwartz Over a year ago

ah I see, thats what your edit is doing. thanks alot

anky · Accepted Answer · 2020-01-13 14:48:23Z

2

here is a way using combine_first():

df_pc.set_index(['Colour','Item']).combine_first(
       df.set_index(['Colour','Item'])).reset_index()

  Colour   Item  Price
0   Blue    Car   40.0
1  Green   Bike   70.0
2  Green  Truck   50.0
3    Red    Car   60.0

EDIT: If you want only matching items, we can also use merge with fillna:

print(df_pc)

  Colour  Item  Price
0     Red   Car     60
1  Orange  Bike     70 #changed row not matching

(df.merge(df_pc, on = ['Colour','Item'],how='left',suffixes=('_x',''))
   .assign(Price=lambda x:x['Price'].fillna(x['Price_x'])).reindex(df.columns,axis=1))

  Colour   Item  Price
0   Blue    Car   40.0
1    Red    Car   60.0
2  Green  Truck   50.0
3  Green   Bike   30.0

edited Jan 13, 2020 at 14:48

answered Jan 13, 2020 at 14:29

anky

75.3k11 gold badges46 silver badges76 bronze badges

Collectives™ on Stack Overflow

Pandas replace dataframe values based on criteria

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related