1

I have a dataframe that looks similar to this.

age gender edu income
15    m     MS   <=50
16    f     BS   >50
17    m     BS   <=50

Since this is a binary problem, i'd like all the <=50K values to be 0 and >50K to be 1. I've tried replace method and it didn't do anything.

data["income"].replace(["<=50K"], "0", inplace = True)

data["income"].replace( to_replace = "<=50K"], value = 0, inplace = True)

2 Answers 2

4

IIUC:

data['income'] = (data.income == '>50').astype(int)

Output:

   age gender edu  income
0   15      m  MS       0
1   16      f  BS       1
2   17      m  BS       0
Sign up to request clarification or add additional context in comments.

1 Comment

@uharsha33 check the contents of the income column. Maybe you have white space that needs to be accounted for. is a boolean expression converted to integer.
2

Using map

df.income=df.income.map({'<=50':0,'>50':1})
df
Out[328]: 
   age gender edu  income
0   15      m  MS       0
1   16      f  BS       1
2   17      m  BS       0

1 Comment

Its outputting NaN's

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.