1

I want to One-hot-encoding several columns and used several solutions include simple one-hot-encoding, ColumnTransformer, make_column_transformer, Pipeline, and get_dummies but anytime I have got different errors.

x = dataset.iloc[:, :11].values
y = dataset.iloc[:, 11].values


""" data encoding """

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


# oe = OrdinalEncoder()
# x = oe.fit_transform(x)

non_cat = ["Make", "Model", "Vehicle", "Transmission", "Fuel"]

onehot_cat = ColumnTransformer([
    ("categorical", OrdinalEncoder(), non_cat),
    ("onehot_categorical", OneHotEncoder(), non_cat)],
    remainder= "passthrough")
x = onehot_cat.fit_transform(x)

error:

[['ACURA' 'ILX' 'COMPACT' ... 6.7 8.5 33]
['ACURA' 'ILX' 'COMPACT' ... 7.7 9.6 29]
['ACURA' 'ILX HYBRID' 'COMPACT' ... 5.8 5.9 48]
...
['VOLVO' 'XC60 T6 AWD' 'SUV - SMALL' ... 8.6 10.3 27]
['VOLVO' 'XC90 T5 AWD' 'SUV - STANDARD' ... 8.3 9.9 29]
['VOLVO' 'XC90 T6 AWD' 'SUV - STANDARD' ... 8.7 10.7 26]]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in _get_column_indices(X, key)
424         try:
--> 425             all_columns = X.columns
426         except AttributeError:

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-4-4008371c305f> in <module>
 24     ("onehot_categorical", OneHotEncoder(), non_cat)],
 25     remainder= "passthrough")
 ---> 26 x = onehot_cat.fit_transform(x)
 27 
 28 print('OneHotEncode = ', x.shape)

~\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
527         self._validate_transformers()
528         self._validate_column_callables(X)
--> 529         self._validate_remainder(X)
530 
531         result = self._fit_transform(X, y, _fit_transform_one)

~\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_remainder(self, X)
325         cols = []
326         for columns in self._columns:
--> 327             cols.extend(_get_column_indices(X, columns))
328 
329         remaining_idx = sorted(set(range(self._n_features)) - set(cols))

~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in _get_column_indices(X, key)
425             all_columns = X.columns
426         except AttributeError:
--> 427             raise ValueError("Specifying the columns using strings is only "
428                              "supported for pandas DataFrames")
429         if isinstance(key, str):

ValueError: Specifying the columns using strings is only supported for pandas DataFrames

2
  • 1
    using .values on a DataFrame object will return a numpy array, not a DataFrame Commented Dec 10, 2020 at 9:32
  • you are more likely to get an answer if you post reproducible code. As it stands, dataset (at least) is not defined Commented Dec 10, 2020 at 11:02

1 Answer 1

1

I got a similar error trying to make prediction using a model. It was expecting a dataframe but I was sending a numpy object instead. So I changed it from:

prediction = monitor_model.predict(s_df.to_numpy())

to:

prediction = monitor_model.predict(s_df)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.