3

I have two dataframes with different column size, where four columns can have the same values in both dataframes. I want to make a new column in df1, that takes the value 1 if there is a row in df2 that has the same values for column 'A','B','C', and 'D' as a row in df1. If there isn't such a row, I want the value to be 0. Rows 'E' and 'F' are not important for checking the values.

Is there a pandas function that can do this, or do I have to this in a loop.

For example:

df1 =
A    B    C    D    E    F
1    1    20   20   3    2
1    1    12   14   1    3
2    1    13   43   4    3
2    2    12   34   1    4

df2 =
A    B    C    D    E    
1    3    12   14   2    
1    1    20   20   4   
2    2    21   31   5    
2    2    12   34   8    

expected output:

df1 =
A    B    C    D    E    F    Target
1    1    20   20   3    2    1
1    1    12   14   1    3    0
2    1    13   43   4    3    0
2    2    12   34   1    4    1
0

3 Answers 3

2

This is fairly simple. If you check whether two DataFrames are equal, it checks if each element is equal to the respective element.

col_list = ['A', 'B', 'C', 'D']
idx = (df1.loc[:,  col_list] == df2.loc[:,  col_list]).all(axis=1)

df1['new_row'] = idx.astype(int)
Sign up to request clarification or add additional context in comments.

Comments

1

I think you need merge with left join and parameter indicator=True, then compare column _merge with eq (same as ==) and last convert boolean True and False to 1 and 0 by astype:

cols = list('ABCD')
df1['Target'] = pd.merge(df1[cols], 
                      df2[cols], how='left', indicator=True)['_merge'].eq('both').astype(int)
print (df1)

   A  B   C   D  E  F  Target
0  1  1  20  20  3  2       1
1  1  1  12  14  1  3       0
2  2  1  13  43  4  3       0
3  2  2  12  34  1  4       1

Detail:

print (pd.merge(df1[cols], df2[cols], how='left', indicator=True))
   A  B   C   D     _merge
0  1  1  20  20       both
1  1  1  12  14  left_only
2  2  1  13  43  left_only
3  2  2  12  34       both

Comments

0

You can use logical operators for that. You can have a look at Logic operator for boolean indexing in Pandas or Element-wise logical OR in Pandas for some ideas.

But your specification does not suffice for providing a solution sketch because I do not know how the rows in df1 should work with df2. Is it that the number of rows are the same and each row in df1 should have the column with the boolean value for that in df2 in the same row A, B, C, and D are the same?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.