Python Pandas drop row duplicates on a column if no duplicate on other column

Question

I have this df with email headers. I need to eliminate all duplicates where Subject is the same AND Source is different. I have spent hours trying to figure out a solution or find a similar case...

Date	From	Subject	Source
12/06/21	Sender1	Test123	Inbox
12/06/21	Sender2	Confirm	Inbox
12/06/21	Sender1	Test123	Sent
12/06/21	Sender3	Test_on	Inbox
12/06/21	Sender3	Test_on	Inbox

Practically from the table above the rows with subject = 'Test123' should be dropped.

Date	From	Subject	Source
12/06/21	Sender2	Confirm	Inbox
12/06/21	Sender3	Test_on	Inbox
12/06/21	Sender3	Test_on	Inbox

something like df[df['Subject'].duplicated(keep=False) & ~df['Source'].duplicated(keep=False)]? — mitoRibo
– mitoRibo, Commented Dec 6, 2021 at 22:10

Corralien · Accepted Answer · 2021-12-06 22:10:04Z

1

You can use set to determine for each sender if there are multiple source. If yes, drop the row.

>>> df.loc[df.groupby('From')['Source'].transform(lambda x: len(set(x)) == 1)]

       Date     From  Subject Source
1  12/06/21  Sender2  Confirm  Inbox
3  12/06/21  Sender3  Test_on  Inbox
4  12/06/21  Sender3  Test_on  Inbox

answered Dec 6, 2021 at 22:10

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dunu Over a year ago

And it works... Thank you Corralien!

Z Li · Accepted Answer · 2021-12-06 22:11:23Z

0

duplicated_subject = df.duplicated('Subject', keep=False)
duplicated_subject_and_source = df.duplicated(['Subject', 'Source'], keep=False)
df[~duplicated_subject | duplicated_subject_and_source]

eliminate all duplicates where "Subject is the same AND Source is different"

is equivalent to

keep where "Subject is not duplicated OR Subject is duplicated and Source is the same"

answered Dec 6, 2021 at 22:11

Z Li

4,3381 gold badge7 silver badges21 bronze badges

Collectives™ on Stack Overflow

Python Pandas drop row duplicates on a column if no duplicate on other column

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related