2

I'm getting the below error when I call pipeline.fit_transform(X_train, y_train).

AttributeError: 'numpy.ndarray' object has no attribute 'fit'

The individual transformers in the pipeline work fine, but when I combine them in the pipeline I get the error.


X, y = training_data.drop('Response', axis=1), training_data['Response']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

class preprocess(TransformerMixin, BaseEstimator):

    def __init__():
        self.X = None

    def fit(self, X, y=None):
        self.X = X
        self.PI2 = 'Product_Info_2'
        self.PI2_categories = list(training_data[self.PI2].unique())
        return self

    def transform(self, X, y=None):
        Xt = X.copy()
        Xt = pd.concat([Xt, pd.get_dummies(Xt[self.PI2])], axis=1).drop(self.PI2, axis=1)
        Xt.drop('Id', axis=1, inplace=True)
        Xt.fillna(value=0, inplace=True)
        return np.array(Xt)


class apply_NB(TransformerMixin, BaseEstimator):

    def __init__(self):
        self.gridCV = None
        self.params = {"var_smoothing": [x*10**(-9) for x in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 
                                                              0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 
                                                              4, 4.5, 5]]}
        self.best_params = None

    def fit(self, X, y):

        self.gridCV = GridSearchCV(GaussianNB(), self.params, verbose=10, n_jobs=-1)
        self.gridCV.fit(X, y)
        self.best_params = self.gridCV.best_params_
        return self

    def transform(self, X, y=None):
        Xt = self.gridCV.predict(X)
        return Xt

nb_pipeline = Pipeline([('preprocess', preprocess),
                        ('fit_NB', apply_NB())])

nb_pipeline.fit_transform(X_train, y_train)

When I try the final line I just get:

AttributeError: 'numpy.ndarray' object has no attribute 'fit'

1 Answer 1

4

You forgot to put self in the first init of preprocess

class preprocess(TransformerMixin, BaseEstimator):

    def __init__(self):
        self.X = None

and then you gotta initialize the class for applyNB too.

nb_pipeline = [('preprocess', preprocess()),
                    ('fit_NB', apply_NB())]

Seems to work for m after making these changes!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.