0

I am working with the [UCI adult dataset][1]. I have added a row as a header to facilitate operation. I need to change the last column, which can take two values, '<=50k' and '>50k' and whose name is 'etiquette'. I have tried the following

num_datos.loc[num_datos.loc[:,"etiquette"]=="<=50K", "etiquette"]=1 
num_datos.loc[num_datos.loc[:,"etiquette"]==">50K", "etiquette"]=0 

and the following

num_datos['etiquette'].replace(['<=50K'], 1)
num_datos['etiquette'].replace(['>50K'], 0)

However, this seems to do nothing, since if I then execute

print(num_datos.etiquette[0])

I still get a value of <=50K. Is there a way for me to replace the values of the column in question?

0

1 Answer 1

1

Your second try, using df.replace(), should work when you remove the square brackets from your string. So instead use:

    num_datos['etiquette'].replace('<=50K', 1)
    num_datos['etiquette'].replace('>50K', 0)

The function currently interprets ['<=50K'] as a list with one element, and cannot find any values in your dataframe that are a list with that element. Instead, you want it to look for the string.

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Hello. Thanks for the answer. However, I am afraid I have copied your suggested solution, but I still have the problem.
Thanks for your feedback. A different question then, did you assign the result back to num_datos? As in: num_datos = num_datos['etiquette'].replace('<=50K', 1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.