0

I have a dataframe as follows:

import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})

I would like to compare row wise and if two same-indexed rows have the same values, replace the duplicate in the 'label' column with the word 'same'.

Desired output:

           pos        label
0  she is good      same

1   she is bad  she is good

so far, i have tried the following, but it returns an error:

ValueError: Length of values (1) does not match length of index (2)

df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )

1 Answer 1

1

Your syntax is not correct, have a look at the documentation of numpy.where. Check for equality between your two columns, and replace the values in your label column:

import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])

prints:

          text        label
0  she is good         same
1   she is bad  she is good
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.