pandas: removing duplicate values in rows with same index in two columns

Question

I have a dataframe as follows:

import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})

I would like to compare row wise and if two same-indexed rows have the same values, replace the duplicate in the 'label' column with the word 'same'.

Desired output:

           pos        label
0  she is good      same

1   she is bad  she is good

so far, i have tried the following, but it returns an error:

ValueError: Length of values (1) does not match length of index (2)

df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )

sophocles · Accepted Answer · 2021-12-06 16:49:09Z

1

Your syntax is not correct, have a look at the documentation of numpy.where. Check for equality between your two columns, and replace the values in your label column:

import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])

prints:

          text        label
0  she is good         same
1   she is bad  she is good

answered Dec 6, 2021 at 16:49

sophocles

13.9k3 gold badges18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas: removing duplicate values in rows with same index in two columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related