1

I took this example from the sklearn OneHotEncoder documentary page:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male', 1], ['Female', 3], ['Female', 2]]
enc.fit(X)
enc.categories_

enc.transform([['Female', 1], ['Male', 4]]).toarray()

enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])

enc.get_feature_names()

I get:

ValueError: could not convert string to float: 'Male'.

When I replace "Male" and "Female" with numbers: X = [['5', 1], ['4', 3], ['4', 2]]

I get :

AttributeError: 'OneHotEncoder' object has no attribute 'categories_'

My sklearn version is 0.19.1 Can someone reproduce this?

3
  • 1
    You are using an older version in which OneHotEncoder did not have the capability to directly turn strings to one-hot encoded features and seems like you are following the tutorial for latest one. You will need to use LabelEncoder first. Or else upgrade your scikit-learn and then use OneHotEncoder. Commented Nov 26, 2018 at 9:05
  • Try to use LabelEconder first and then apply one-hot encoding to its result Commented Nov 26, 2018 at 9:05
  • 1
    @VivekKumar you are right. After upgrading to 0.20.1 it works. I didn't expect that they changed the OneHotEncoder interface. But thank you very much. Commented Nov 26, 2018 at 9:21

1 Answer 1

2

As Vivek Kumar stated 0.19.1 is too old. Upgrading to version 0.20.1 solved the problem

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.