1

I have a pandas data frame like below.

name    type    loc
abc     cew     hyd
abc     cew     mum
bcd     tes     kkr
ced     fge     abe
ced     fge     abe

Now I want to create two data frames first drop all duplicates and then create data frames

1st df (contains rows for columns where name and type are same)

name    type    loc
abc     cew     hyd
abc     cew     mum

2nd df (contains rows for columns where name and type are different)

name    type    loc
bcd     tes     kkr
ced     fge     abe

I am able to drop the duplicates like below

df = df1.drop_duplicates(subset='name', keep='first')

But from here I have not able to proceed further. Answers with explanation will be helpful

1 Answer 1

2

First drop_duplicates by all columns and then use duplicated for boolean mask with boolean indexing for filtering, ~ is for invert mask:

df = df.drop_duplicates()
m = df.duplicated(['name','type'], keep=False) 
print (m)
0     True
1     True
2    False
3    False
dtype: bool

df1 = df[m]
print (df1)
  name type  loc
0  abc  cew  hyd
1  abc  cew  mum

df2 = df[~m]
print (df2)
  name type  loc
2  bcd  tes  kkr
3  ced  fge  abe
Sign up to request clarification or add additional context in comments.

3 Comments

In your answer in df1 If I drop duplicate row then the 3and 4 records only one will be left. then that record should be moved to df2
I am getting the below error Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
You are rigth, sorry. Please check last edit. Need first drop duplicates to new dataframe and then create mask and filter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.