2

I have a data frame with true/false values stored in string format. Some values are null in the data frame.

I need to encode this data such that TRUE/FALSE/null values are encoded with the same integer in every column.

Input:

col1 col2 col3
True True False
True True True
null null True

I am using:

le = preprocessing.LabelEncoder()
df.apply(le.fit_transform)

Output:

2 1 0
2 1 1
1 0 1

But I want the output as:

2 2 0
2 2 2
1 1 2

How do i do this?

1 Answer 1

5

For me working create one column DataFrame:

df = df.stack(dropna=False).to_frame().apply(le.fit_transform)[0].unstack()
print (df)
   col1  col2  col3
0     1     1     0
1     1     1     1
2     2     2     1

Another idea is use DataFrame.replace with 'True' instead True, because:

I have a data frame with true/false values stored in string format.

If null are missing values:

df = df.replace({'True':2, 'False':1, np.nan:0})

If null are strings null:

df = df.replace({'True':2, 'False':1, 'null':0})

print (df)
   col1  col2  col3
0     2     2     1
1     2     2     2
2     0     0     2
Sign up to request clarification or add additional context in comments.

2 Comments

@jazrael df.replace works well for boolean values since there are 3 possible values. Lets say if each column had 100 different values possible. How to encode in such situations where I need same integer encoding for same value?
@AkshayBharadwaj - Then use first solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.