How to find duplicates in one column in pandas in python

Question

I have a dataframe as follows where I want to keep the first occurrence of the duplicate and remove the remaining duplicates.

For example, consider the below mentioned dataframe. We can see duplicates in title column such as nn nn, mm mm etc. I want to remove them by keeping only the first occurrence of it.

id title
12 nn nn
11 nn nn
10 nn nn
18 mm mm
19 nn nn
06 mm mm
08 ll ll
09 jj jj
26 ll ll

My output should look as follows:

id title
12 nn nn
18 mm mm
08 ll ll
09 jj jj

I tried the following pandas code:

L= input_data[["id","title"]]
L_new = L[~L.duplicated()]

However, it does not remove duplicates as I wanted.

I am happy to provide more details if needed.

Alex Fish · Accepted Answer · 2019-07-23 03:24:02Z

1

Try input_data.groupby('title').first().

answered Jul 23, 2019 at 3:24

Alex Fish

7887 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2019-07-23 03:34:35Z

1

We can using head

df.groupby('title').head(1)
   id  title
0  12  nn nn
3  18  mm mm
6   8  ll ll
7   9  jj jj

answered Jul 23, 2019 at 3:34

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

How to find duplicates in one column in pandas in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related