3

I have a master dataframe, df:

Colour Item   Price
Blue   Car     40
Red   Car     30
Green  Truck   50
Green  Bike    30

I then have a price correction dataframe, df_pc:

Colour Item   Price
Red   Car     60
Green  Bike    70

I want to say if there is a match on Colour and Item in the price correction dataframe, then replace the price in the master df. so the expected output is;

Colour Item   Price
Blue   Car     60
Red   Car     30
Green  Truck   50
Green  Bike    70

I can't find a way of doing this currently

3
  • 1
    First, think about what should happen, if there's more than one match. Assuming there's not, you can merge both dataframes on the key Colour/Item and then fill the blanks in the merged dataframe column with values from the first dataframe. Please post a working code example generating your dataframes if you need exact code help. Commented Jan 13, 2020 at 14:26
  • ok thanks, yes no duplicates. would you say thats the best way of doing it? Commented Jan 13, 2020 at 14:27
  • 1
    there's df.drop_duplicates(keep='first', subset=['Colour', 'Item']) with parameters on what to do Commented Jan 13, 2020 at 14:28

2 Answers 2

3

Use Index.isin for filter out no matched rows and then DataFrame.combine_first:

df = df.set_index(['Colour','Item'])
df_pc = df_pc.set_index(['Colour','Item'])

df_pc = df_pc[df_pc.index.isin(df.index)]
df = df_pc.combine_first(df).reset_index()
print (df)
  Colour   Item  Price
0   Blue    Car   40.0
1  Green   Bike   70.0
2  Green  Truck   50.0
3    Red    Car   60.0

Another data test:

print (df_pc)
   Colour  Item  Price
0     Red   Car     60
1  Orange  Bike     70 <- not matched row

df = df.set_index(['Colour','Item'])
df_pc = df_pc.set_index(['Colour','Item'])
df_pc = df_pc[df_pc.index.isin(df.index)]
df = df_pc.combine_first(df).reset_index()
print (df)
  Colour   Item  Price
0   Blue    Car   40.0
1  Green   Bike   30.0
2  Green  Truck   50.0
3    Red    Car   60.0
Sign up to request clarification or add additional context in comments.

2 Comments

thanks @jezrael. this is great. May I just check, if there is an unnecessary row in the df_pc, that is then just ignored (and not added to master in anyway)?
ah I see, thats what your edit is doing. thanks alot
2

here is a way using combine_first():

df_pc.set_index(['Colour','Item']).combine_first(
       df.set_index(['Colour','Item'])).reset_index()

  Colour   Item  Price
0   Blue    Car   40.0
1  Green   Bike   70.0
2  Green  Truck   50.0
3    Red    Car   60.0

EDIT: If you want only matching items, we can also use merge with fillna:

print(df_pc)

  Colour  Item  Price
0     Red   Car     60
1  Orange  Bike     70 #changed row not matching

(df.merge(df_pc, on = ['Colour','Item'],how='left',suffixes=('_x',''))
   .assign(Price=lambda x:x['Price'].fillna(x['Price_x'])).reindex(df.columns,axis=1))

  Colour   Item  Price
0   Blue    Car   40.0
1    Red    Car   60.0
2  Green  Truck   50.0
3  Green   Bike   30.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.