1

I had a problem and I found a solution but I feel it's the wrong way to do it. Maybe, there is a more 'canonical' way to do it.

I already had an answer for a really similar problem, but here I have not the same amount of rows in each dataframe. Sorry for the "double-post", but the first one is still valid so I think it's better to make a new one.

Problem

I have two dataframe that I would like to merge without having extra column and without erasing existing infos. Example :

Existing dataframe (df)

   A  A2  B
0  1   4  0
1  2   5  1
2  2   5  1

Dataframe to merge (df2)

   A  A2  B
0  1   4  2
1  3   5  2

I would like to update df with df2 if columns 'A' and 'A2' corresponds. The result would be :

   A  A2    B
0  1   4  2 <= Update value ONLY
1  2   5  1
2  2   5  1

Here is my solution, but I think it's not a really good one.

import pandas as pd

df = pd.DataFrame([[1,4,0],[2,5,1],[2,5,1]],columns=['A','A2','B'])

df2 = pd.DataFrame([[1,4,2],[3,5,2]],columns=['A','A2','B'])

df = df.merge(df2,on=['A', 'A2'],how='left')
df['B_y'].fillna(0, inplace=True)
df['B'] = df['B_x']+df['B_y']
df = df.drop(['B_x','B_y'], axis=1)
print(df)

I tried this solution :

rows = (df[['A','A2']] == df2[['A','A2']]).all(axis=1)
df.loc[rows,'B'] = df2.loc[rows,'B']

But I have this error because of the wrong number of rows :

ValueError: Can only compare identically-labeled DataFrame objects

Does anyone has a better way to do ? Thanks !

2 Answers 2

2

I think you can use DataFrame.isin for check where are same rows in both DataFrames. Then create NaN by mask, which is filled by combine_first. Last cast to int:

mask = df[['A', 'A2']].isin(df2[['A', 'A2']]).all(1)
print (mask)
0     True
1    False
2    False
dtype: bool

df.B = df.B.mask(mask).combine_first(df2.B).astype(int)
print (df)
   A  A2  B
0  1   4  2
1  2   5  1
2  2   5  1
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you ! Not so easy, but I will analyse/learn/use this :D
1

With a minor tweak in the way in which the boolean mask gets created, you can get it to work:

cols = ['A', 'A2']
# Slice it to match the shape of the other dataframe to compare elementwise
rows = (df[cols].values[:df2.shape[0]] == df2[cols].values).all(1)
df.loc[rows,'B'] = df2.loc[rows,'B']
df

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.