0

I have a dataframe as follows where I want to keep the first occurrence of the duplicate and remove the remaining duplicates.

For example, consider the below mentioned dataframe. We can see duplicates in title column such as nn nn, mm mm etc. I want to remove them by keeping only the first occurrence of it.

id title
12 nn nn
11 nn nn
10 nn nn
18 mm mm
19 nn nn
06 mm mm
08 ll ll
09 jj jj
26 ll ll 

My output should look as follows:

id title
12 nn nn
18 mm mm
08 ll ll
09 jj jj

I tried the following pandas code:

L= input_data[["id","title"]]
L_new = L[~L.duplicated()]

However, it does not remove duplicates as I wanted.

I am happy to provide more details if needed.

2 Answers 2

1

Try input_data.groupby('title').first().

Sign up to request clarification or add additional context in comments.

Comments

1

We can using head

df.groupby('title').head(1)
   id  title
0  12  nn nn
3  18  mm mm
6   8  ll ll
7   9  jj jj

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.