2

I'm trying to use SKLearn 0.20.2 to make a pipeline while using the new ColumnTransformer feature. My problem is that I keep getting the error:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

I have a column of blobs of text called, text. All of my other columns are numerical in nature. I'm trying to use the Countvectorizer in my pipeline and I think that's where the trouble is. Would much appreciate a hand with this.

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
# plus other necessary modules

# mapped to column names from dataframe
numeric_features = ['hasDate', 'iterationCount', 'hasItemNumber', 'isEpic']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median'))
])

# mapped to column names from dataframe
text_features = ['text']
text_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent”')),
    ('vect', CountVectorizer())
])

preprocessor = ColumnTransformer(
    transformers=[('num', numeric_transformer, numeric_features),('text', text_transformer, text_features)]
)

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', MultinomialNB())
                     ])

x_train, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.33)
clf.fit(x_train,y_train)
5
  • 1
    SimpleImputer is not intended for text. Try text_transformer = Pipeline([('vect', CountVectorizer())]) and see what happens. Commented Feb 5, 2019 at 9:02
  • Thanks @SergeyBushmanov! Now I have a new error: ValueError: all the input array dimensions except for the concatenation axis must match exactly. I'll update my snippet to remove the imputer. Commented Feb 5, 2019 at 14:45
  • @SergeyBushmanov actually, I went ahead and put the problematic code back and left an answer with your instructions because it did indeed fix initial error. Commented Feb 5, 2019 at 17:51
  • As far as your latest error is concerned. Can you track down, e.g. with fit_transform method, which line, preprocessor or clf, produces the error? Commented Feb 5, 2019 at 18:24
  • @SergeyBushmanov I marked this question as answered since you did give me a solution to the error. I started a new question with details of the new error here: stackoverflow.com/questions/54541490/… Commented Feb 5, 2019 at 21:37

1 Answer 1

1

@SergeyBushmanov helped me diagnose the error in my title, it was caused by running SimpleImputer on text.

I have a further error that I'll write a new question for.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.