0

I have a df with two columns and I need to find and store only duplicates.

|-------------------|-------------|
|      col1         |    col2     |
|-------------------|-------------|
|     apple         |  mango      |
|-------------------|-------------|
|     banana        |  grape      |
|-------------------|-------------|
|     pear          |  watermelon |
|-------------------|-------------|
|     cherry        |  banana     |
|-------------------|-------------|
|     mango         |  apple      |
|-------------------|-------------|

The result should return a df with col1 like this

    |----------------|
    |   col1         |   
    |----------------|
    |   apple        |        
    |----------------|
    |   banana       |   
    |----------------|
    |   mango        |        
    |----------------|

I tried something like this, but it doesnt fetch me the same resuts.

df['a_flag'] = df['col2'].isin(df['col1']).astype(int)

df1=df[(df['a_flag']==1)]
2

2 Answers 2

2

You can use loc to also pass the column name:

df.loc[df['col2'].isin(df['col1']), ['col1']]

Output:

     col1
0   apple
3  cherry
4   mango
Sign up to request clarification or add additional context in comments.

1 Comment

this is nice if there is only one column to compare, I guess if there is many you could melt/stack then use isin
0

Thank you for the response. However I tried using sets and even that seemed to work. Not sure which is more efficient though

Here is the code that worked:

lst1=list(df['col1'])
lst2=list(df['col2'])
lst3=list(set(lst1) & set(lst2))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.