1

Suppose I have this dataframe (call it df): enter image description here

Here's what I want to do with the dataframe: 1. Select the rows that match with Col1 and Col2, if there are two rows for each id. 2. If there's only one row for the id, then select the row, even if the Col1 and Col2 do not match.

df = df[df['Col1'] == df['Col2']]

This code is wrong, because it doesn't satisfy the requirement 2 above. This is the result I want:enter image description here

I would really appreciate it if someone could explain to me how to accomplish this! Thank you.

10
  • please include the code that generates your dataframe Commented Jul 14, 2017 at 14:10
  • Columns 1 and 2 either match or don't. By requirement 1 and 2, won't we simply return the entire dataframe anyways? Commented Jul 14, 2017 at 14:16
  • dustin, the requirement 1: for the ids with two rows, want to select a row whose col1 and col2 match. requirement 2: for the ideas with only one row, simply select that row. Commented Jul 14, 2017 at 14:18
  • @JunJang to me it isn't clear what you are trying to do. Commented Jul 14, 2017 at 14:21
  • d = {'id':['1', '1', '2', '2', '3', '3', '4', '5', '5', '6', '6', '7'], 'Col1': ['Pizza', 'Pizza', 'Pizza', 'Pizza', 'Ramen', 'Ramen', 'Ramen', 'Pizza', 'Pizza', 'Ramen', 'Ramen', 'Pizza'], 'Col2': ['Pizza', 'Ramen', 'Ramen', 'Pizza', 'Ramen', 'Pizza', 'Pizza', 'Ramen', 'Pizza', 'Pizza', 'Ramen', 'Ramen'], 'Col3': [100, 30, 150, 300, 230, 20, 13, 230, 13, 35, 30, 45]} df = pd.DataFrame(d) Commented Jul 14, 2017 at 14:22

1 Answer 1

2

Assuming there are only unique and duplicated values with length 2 in id column.

Then use duplicated for select all duplicates with ~ for inverse mask - select all unique rows:

m1 = df['Col1'] == df['Col2']
m2 = df['id'].duplicated(keep=False)
df = df[(m1 & m2) | ~m2]
print (df)
     Col1   Col2  Col3 id
0   Pizza  Pizza   100  1
3   Pizza  Pizza   300  2
4   Ramen  Ramen   230  3
6   Ramen  Pizza    13  4
8   Pizza  Pizza    13  5
10  Ramen  Ramen    30  6
11  Pizza  Ramen    45  7
Sign up to request clarification or add additional context in comments.

1 Comment

Glad can help! One question - I use duplicated(keep=False) for selecting all duplicates in id. There is possible 3 or more dupes like id=1,1,1 ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.