SKLearn Pipeline w/ ColumnTransformer: 'numpy.ndarray' object has no attribute 'lower'

Question

I'm trying to use SKLearn 0.20.2 to make a pipeline while using the new ColumnTransformer feature. My problem is that I keep getting the error:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

I have a column of blobs of text called, text. All of my other columns are numerical in nature. I'm trying to use the Countvectorizer in my pipeline and I think that's where the trouble is. Would much appreciate a hand with this.

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
# plus other necessary modules

# mapped to column names from dataframe
numeric_features = ['hasDate', 'iterationCount', 'hasItemNumber', 'isEpic']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median'))
])

# mapped to column names from dataframe
text_features = ['text']
text_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent”')),
    ('vect', CountVectorizer())
])

preprocessor = ColumnTransformer(
    transformers=[('num', numeric_transformer, numeric_features),('text', text_transformer, text_features)]
)

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', MultinomialNB())
                     ])

x_train, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.33)
clf.fit(x_train,y_train)

SimpleImputer is not intended for text. Try text_transformer = Pipeline([('vect', CountVectorizer())]) and see what happens. — Sergey Bushmanov
– Sergey Bushmanov, Commented Feb 5, 2019 at 9:02
Thanks @SergeyBushmanov! Now I have a new error: ValueError: all the input array dimensions except for the concatenation axis must match exactly. I'll update my snippet to remove the imputer. — bill-lamin
– bill-lamin, Commented Feb 5, 2019 at 14:45
@SergeyBushmanov actually, I went ahead and put the problematic code back and left an answer with your instructions because it did indeed fix initial error. — bill-lamin
– bill-lamin, Commented Feb 5, 2019 at 17:51
As far as your latest error is concerned. Can you track down, e.g. with fit_transform method, which line, preprocessor or clf, produces the error? — Sergey Bushmanov
– Sergey Bushmanov, Commented Feb 5, 2019 at 18:24
@SergeyBushmanov I marked this question as answered since you did give me a solution to the error. I started a new question with details of the new error here: stackoverflow.com/questions/54541490/… — bill-lamin
– bill-lamin, Commented Feb 5, 2019 at 21:37

bill-lamin · Accepted Answer · 2019-02-05 17:48:08Z

1

@SergeyBushmanov helped me diagnose the error in my title, it was caused by running SimpleImputer on text.

I have a further error that I'll write a new question for.

answered Feb 5, 2019 at 17:48

bill-lamin

3431 gold badge3 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

SKLearn Pipeline w/ ColumnTransformer: 'numpy.ndarray' object has no attribute 'lower'

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related