0

I have a fairly large dataframe with both numerical and categorical values. I'm trying to encode the categorical values but am getting the above error.

Here's a simple version of the code:

from collections import defaultdict
d = defaultdict(LabelEncoder)
# Encoding the variable
fit = df[catgoricalValues].apply(lambda x: d[x.name].fit_transform(df[catgoricalValues]))

I'm using the approach described here, except instead of applying it on the entire dataframe, I specified the columns to encode.

I get this error:

ValueError: bad input shape (490546, 11)
1

1 Answer 1

0

Update

Seems like you are trying to apply the LabelEncoder to multiple columns; While you can apply the same LabelEncoder to all columns;

from sklearn.preprocessing import LabelEncoder

encoded = df[categoricalVal].apply(LabelEncoder().fit_transform)

It it better to use a new encoder for each columns. The link above should provide you with the solution.

Sign up to request clarification or add additional context in comments.

3 Comments

I received the same error - ValueError: bad input shape (490546, 11)
Thank you for the update. Is there a better way to do what I'm trying to do? I just have a list of categorical values and I want to encode them all(then reverse the encoding later). When I ran your code I got this error - TypeError: argument must be a string or number
Okay, can you provide a small example of your dataset?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.