How to compare two columns to get duplicates in python

Question

I have a df with two columns and I need to find and store only duplicates.

|-------------------|-------------|
|      col1         |    col2     |
|-------------------|-------------|
|     apple         |  mango      |
|-------------------|-------------|
|     banana        |  grape      |
|-------------------|-------------|
|     pear          |  watermelon |
|-------------------|-------------|
|     cherry        |  banana     |
|-------------------|-------------|
|     mango         |  apple      |
|-------------------|-------------|

The result should return a df with col1 like this

    |----------------|
    |   col1         |   
    |----------------|
    |   apple        |        
    |----------------|
    |   banana       |   
    |----------------|
    |   mango        |        
    |----------------|

I tried something like this, but it doesnt fetch me the same resuts.

df['a_flag'] = df['col2'].isin(df['col1']).astype(int)

df1=df[(df['a_flag']==1)]

Does this answer your question? Find intersection of two columns in Python Pandas -> list of strings — Tomerikoo
– Tomerikoo, Commented Oct 13, 2020 at 14:54

Quang Hoang · Accepted Answer · 2020-10-13 15:01:07Z

2

You can use loc to also pass the column name:

df.loc[df['col2'].isin(df['col1']), ['col1']]

Output:

     col1
0   apple
3  cherry
4   mango

answered Oct 13, 2020 at 15:01

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Umar.H Over a year ago

this is nice if there is only one column to compare, I guess if there is many you could melt/stack then use isin

lea · Accepted Answer · 2020-10-15 20:03:20Z

0

Thank you for the response. However I tried using sets and even that seemed to work. Not sure which is more efficient though

Here is the code that worked:

lst1=list(df['col1'])
lst2=list(df['col2'])
lst3=list(set(lst1) & set(lst2))

answered Oct 15, 2020 at 20:03

lea

1432 silver badges16 bronze badges

Collectives™ on Stack Overflow

How to compare two columns to get duplicates in python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related