remove duplicate elements in dataframes python

Question

I'm trying to remove duplicate elements in column 'p1' and 'p2' i.e should the elements already occurred in column 'p1' it should not reappear in 'p2' or any subsequent column. For eg, for the code below, only 'a b' and 'c d' will remain.

Whats the effecient way of doing this?

import pandas as pd
df = pd.DataFrame({'p1':['a','b','a','a','b','d','c'],
                'p2':['b','a','c','d','c','a','d'],
                'value':[1,1,2,3,5,3,5]})
df

jezrael · Accepted Answer · 2016-04-10 07:01:38Z

1

You can first set_index from column value, stack for creating Series, drop_duplicates, unstack and last reset_index:

print df.set_index('value').stack().drop_duplicates().unstack().reset_index()
   value    p1 p2
0      1     a  b
1      2  None  c
2      3  None  d

answered Apr 10, 2016 at 7:01

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ShinjiOno Over a year ago

Hi Jezrael, maybe let me rephrase my question, p1 and p2 got to be a pair. However, whatever already happen in any of my p1 or p2, i do not wish the item to reappear in any of my subsequent p1 or p2

jezrael Over a year ago

IIUC you need print df.drop_duplicates(subset=['p1','p2']) ?

jezrael Over a year ago

Maybe understand - you need remove all rows in output df with None values?

ShinjiOno Over a year ago

I would like to drop the row if p1 duplicate in p2. what's the efficient way of doing it?

jezrael Over a year ago

Desired output is print pd.DataFrame({'p2': {1: 'a', 4: 'c'}, 'p1': {1: 'b', 4: 'b'}, 'value': {1: 1, 4: 5}}) ?

|

hd1 · Accepted Answer · 2016-04-10 07:20:47Z

0

Series( pd.DataFrame({'p1':['a','b','a','a','b','d','c'],'p2':['b','a','c','d','c','a','d'],'value':[1,1,2,3,5,3,5]}).values.ravel()).unique()

I'll post output as soon as I get pandas installed in my virtualenv.

answered Apr 10, 2016 at 7:20

hd1

34.9k5 gold badges83 silver badges95 bronze badges

1 Comment

jezrael Over a year ago

But OP wants remove duplicates only from p1 and p2. Next problem is lost index, if df is created by constructor.

Collectives™ on Stack Overflow

remove duplicate elements in dataframes python

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related