I have this df with email headers. I need to eliminate all duplicates where Subject is the same AND Source is different. I have spent hours trying to figure out a solution or find a similar case...
| Date | From | Subject | Source |
|---|---|---|---|
| 12/06/21 | Sender1 | Test123 | Inbox |
| 12/06/21 | Sender2 | Confirm | Inbox |
| 12/06/21 | Sender1 | Test123 | Sent |
| 12/06/21 | Sender3 | Test_on | Inbox |
| 12/06/21 | Sender3 | Test_on | Inbox |
Practically from the table above the rows with subject = 'Test123' should be dropped.
| Date | From | Subject | Source |
|---|---|---|---|
| 12/06/21 | Sender2 | Confirm | Inbox |
| 12/06/21 | Sender3 | Test_on | Inbox |
| 12/06/21 | Sender3 | Test_on | Inbox |
df[df['Subject'].duplicated(keep=False) & ~df['Source'].duplicated(keep=False)]?