0

I am trying to overwrite specific rows and columns from one dataframe with a second dataframe rows and columns. I can't give the actual data but I will use a proxy here.

Here is an example and what I have tried:

df1
    UID   B     C     D     
0   X14   cat   red   One
1   X26   cat   blue  Two
2   X99   cat   pink  One
3   X54   cat   pink  One


df2
   UID    B     C      EX2
0   X14   dog   blue   coat
1   X88   rat   green  jacket
2   X99   bat   red    glasses
3   X29   bat   red    shoes

What I want to do here is overwrite column B and C in df1 with the values in df2 based upon UID. Therefore in this example X88 and X29 from df2 would not appear in df2. Also column D would not be affected and EX2 not

The outcome would looks as such:

df1
    UID   B     C     D     
0   X14   dog   blue  One
1   X26   cat   blue  Two
2   X99   bat   red   One
3   X54   cat   pink  One

I looked at this solution : Pandas merge two dataframe and overwrite rows However this appears to only update null values whereas I want an overwrite.

My attempt looked this like:

df = df1.merge(df2.filter(['B', 'C']), on=['B', 'C'], how='left')

For my data this actually doesn't seem to overwrite anything. Please could someone explain why this would not work?

Thanks

3 Answers 3

3

One approach could be as follows:

  • First, use df.set_index to make column UID your index (inplace).
  • Next, use df.update with parameter overwrite set to True (also use set_index here for the "other" df: df2). This will overwrite all the columns that the two dfs have in common (i.e. B and C) based on index matches (i.e. now UID).
  • Finally, restore the standard index using df.reset_index.
df1.set_index('UID', inplace=True)
df1.update(df2.set_index('UID'), overwrite=True)
df1.reset_index(inplace=True)
print(df1)

   UID    B     C    D
0  X14  dog  blue  One
1  X26  cat  blue  Two
2  X99  bat   red  One
3  X54  cat  pink  One
Sign up to request clarification or add additional context in comments.

6 Comments

I like the solution although strangely my values so not get overwritten?
Curious. Maybe you could double check 2 things. 1) Do the entries in the respective “UID” that look like full matches also actually match; e.g. maybe one of the two contains a trailing space?); 2) Same for the col names: are they full matches, e.g. not “B” and “B ”?
One other question, can I do this without setting any index inplace?
Also no trailing space. Seems like it should work but it's no. .update is inplace right?
I've got it, this is the issue as I was assigning index inplace and attempted to chain .set_index() into the update itself. Many thanks
|
1

You can approach this by using reindex_like and combine_first.

Try this :

out = (
        df2.set_index("UID")
           .reindex_like(df1.set_index("UID"))
           .combine_first(df1.set_index("UID"))
           .reset_index()
       )

# Output :

print(out)

   UID    B     C    D
0  X14  dog  blue  One
1  X26  cat  blue  Two
2  X99  bat   red  One
3  X54  cat  pink  One

1 Comment

Thank you Johan, answer undeleted ;)
0

Using Update function

df1.set_index('UID', inplace=True)
df2.set_index('UID', inplace=True)

df1.update(df2)
df1.reset_index(inplace=True)
print(df1)
Output
   UID    B     C    D
0  X14  dog  blue  One
1  X26  cat  blue  Two
2  X99  bat   red  One
3  X54  cat  pink  One

2 Comments

I like the solution although strangely my values so not get overwritten?
@geds133 Please recheck if you are running all the lines of code. I have tested it on the same sample data provided in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.