Encoding the same values in different columns with same integer in python

Question

I have a data frame with true/false values stored in string format. Some values are null in the data frame.

I need to encode this data such that TRUE/FALSE/null values are encoded with the same integer in every column.

Input:

col1 col2 col3
True True False
True True True
null null True

I am using:

le = preprocessing.LabelEncoder()
df.apply(le.fit_transform)

Output:

2 1 0
2 1 1
1 0 1

But I want the output as:

2 2 0
2 2 2
1 1 2

How do i do this?

jezrael · Accepted Answer · 2020-02-24 08:46:32Z

5

For me working create one column DataFrame:

df = df.stack(dropna=False).to_frame().apply(le.fit_transform)[0].unstack()
print (df)
   col1  col2  col3
0     1     1     0
1     1     1     1
2     2     2     1

Another idea is use DataFrame.replace with 'True' instead True, because:

I have a data frame with true/false values stored in string format.

If null are missing values:

df = df.replace({'True':2, 'False':1, np.nan:0})

If null are strings null:

df = df.replace({'True':2, 'False':1, 'null':0})

print (df)
   col1  col2  col3
0     2     2     1
1     2     2     2
2     0     0     2

edited Feb 24, 2020 at 8:46

answered Feb 24, 2020 at 8:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Akshay Bharadwaj Over a year ago

@jazrael df.replace works well for boolean values since there are 3 possible values. Lets say if each column had 100 different values possible. How to encode in such situations where I need same integer encoding for same value?

jezrael Over a year ago

@AkshayBharadwaj - Then use first solution.

Collectives™ on Stack Overflow

Encoding the same values in different columns with same integer in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related