sklearn OneHotEncoder broken- ValueError: could not convert string to float

Question

I took this example from the sklearn OneHotEncoder documentary page:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male', 1], ['Female', 3], ['Female', 2]]
enc.fit(X)
enc.categories_

enc.transform([['Female', 1], ['Male', 4]]).toarray()

enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])

enc.get_feature_names()

I get:

ValueError: could not convert string to float: 'Male'.

When I replace "Male" and "Female" with numbers: X = [['5', 1], ['4', 3], ['4', 2]]

I get :

AttributeError: 'OneHotEncoder' object has no attribute 'categories_'

My sklearn version is 0.19.1 Can someone reproduce this?

You are using an older version in which OneHotEncoder did not have the capability to directly turn strings to one-hot encoded features and seems like you are following the tutorial for latest one. You will need to use LabelEncoder first. Or else upgrade your scikit-learn and then use OneHotEncoder. — Vivek Kumar
– Vivek Kumar, Commented Nov 26, 2018 at 9:05
Try to use LabelEconder first and then apply one-hot encoding to its result — gripep
– gripep, Commented Nov 26, 2018 at 9:05
@VivekKumar you are right. After upgrading to 0.20.1 it works. I didn't expect that they changed the OneHotEncoder interface. But thank you very much. — nick
– nick, Commented Nov 26, 2018 at 9:21

nick · Accepted Answer · 2018-11-26 09:23:40Z

2

As Vivek Kumar stated 0.19.1 is too old. Upgrading to version 0.20.1 solved the problem

answered Nov 26, 2018 at 9:23

nick

997 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

sklearn OneHotEncoder broken- ValueError: could not convert string to float

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related