overwrite dataframe rows with merge

Question

I am trying to overwrite specific rows and columns from one dataframe with a second dataframe rows and columns. I can't give the actual data but I will use a proxy here.

Here is an example and what I have tried:

df1
    UID   B     C     D     
0   X14   cat   red   One
1   X26   cat   blue  Two
2   X99   cat   pink  One
3   X54   cat   pink  One


df2
   UID    B     C      EX2
0   X14   dog   blue   coat
1   X88   rat   green  jacket
2   X99   bat   red    glasses
3   X29   bat   red    shoes

What I want to do here is overwrite column B and C in df1 with the values in df2 based upon UID. Therefore in this example X88 and X29 from df2 would not appear in df2. Also column D would not be affected and EX2 not

The outcome would looks as such:

df1
    UID   B     C     D     
0   X14   dog   blue  One
1   X26   cat   blue  Two
2   X99   bat   red   One
3   X54   cat   pink  One

I looked at this solution : Pandas merge two dataframe and overwrite rows However this appears to only update null values whereas I want an overwrite.

My attempt looked this like:

df = df1.merge(df2.filter(['B', 'C']), on=['B', 'C'], how='left')

For my data this actually doesn't seem to overwrite anything. Please could someone explain why this would not work?

Thanks

ouroboros1 · Accepted Answer · 2022-12-03 12:53:17Z

3

One approach could be as follows:

First, use df.set_index to make column UID your index (inplace).
Next, use df.update with parameter overwrite set to True (also use set_index here for the "other" df: df2). This will overwrite all the columns that the two dfs have in common (i.e. B and C) based on index matches (i.e. now UID).
Finally, restore the standard index using df.reset_index.

df1.set_index('UID', inplace=True)
df1.update(df2.set_index('UID'), overwrite=True)
df1.reset_index(inplace=True)
print(df1)

   UID    B     C    D
0  X14  dog  blue  One
1  X26  cat  blue  Two
2  X99  bat   red  One
3  X54  cat  pink  One

answered Dec 3, 2022 at 12:53

ouroboros1

15.2k7 gold badges49 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

geds133 Over a year ago

I like the solution although strangely my values so not get overwritten?

ouroboros1 Over a year ago

Curious. Maybe you could double check 2 things. 1) Do the entries in the respective “UID” that look like full matches also actually match; e.g. maybe one of the two contains a trailing space?); 2) Same for the col names: are they full matches, e.g. not “B” and “B ”?

geds133 Over a year ago

One other question, can I do this without setting any index inplace?

geds133 Over a year ago

Also no trailing space. Seems like it should work but it's no. .update is inplace right?

geds133 Over a year ago

I've got it, this is the issue as I was assigning index inplace and attempted to chain .set_index() into the update itself. Many thanks

|

Timeless · Accepted Answer · 2022-12-03 12:41:16Z

1

You can approach this by using reindex_like and combine_first.

Try this :

out = (
        df2.set_index("UID")
           .reindex_like(df1.set_index("UID"))
           .combine_first(df1.set_index("UID"))
           .reset_index()
       )

# Output :

print(out)

   UID    B     C    D
0  X14  dog  blue  One
1  X26  cat  blue  Two
2  X99  bat   red  One
3  X54  cat  pink  One

answered Dec 3, 2022 at 12:41

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

1 Comment

Timeless Over a year ago

Thank you Johan, answer undeleted ;)

Vinay · Accepted Answer · 2022-12-03 12:54:20Z

0

Using Update function

df1.set_index('UID', inplace=True)
df2.set_index('UID', inplace=True)

df1.update(df2)
df1.reset_index(inplace=True)
print(df1)

Output

   UID    B     C    D
0  X14  dog  blue  One
1  X26  cat  blue  Two
2  X99  bat   red  One
3  X54  cat  pink  One

answered Dec 3, 2022 at 12:54

Vinay

1368 bronze badges

2 Comments

geds133 Over a year ago

I like the solution although strangely my values so not get overwritten?

Vinay Over a year ago

@geds133 Please recheck if you are running all the lines of code. I have tested it on the same sample data provided in the question.

Collectives™ on Stack Overflow

overwrite dataframe rows with merge

3 Answers 3

6 Comments

# Output :

1 Comment

Using Update function

Output

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

# Output :

1 Comment

Using Update function

Output

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related