Scikit-learn TransformerMixin : 'numpy.ndarray' object has no attribute 'fit'

Question

I want to build a sklearn Pipeline (part of a further larger Pipeline), which :

encode categorical columns (OneHotEncoder)
reduce dimension (SVD)
add numerical columns (without transformation)
aggregate lines (pandas groupby)

I used this pipeline example :

and this example for custom TranformerMixin :

I get an error at step 4 (no error if I comment step 4) :

AttributeError Traceback (most recent call last) in () ----> 1 X_train_transformed = pipe.fit_transform(X_train) .... AttributeError: 'numpy.ndarray' object has no attribute 'fit'

My code :

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.decomposition import TruncatedSVD
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.compose import ColumnTransformer

# does nothing, but is here to collect numerical columns
class nothing(BaseEstimator, TransformerMixin):

    def fit(self, X, y=None):       

        return self

    def transform(self, X):          

        return X


class Aggregator(BaseEstimator, TransformerMixin):

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X = pd.DataFrame(X)
        X = X.rename(columns = {0 :'InvoiceNo', 1 : 'amount', 2:'Quantity', 
                                3:'UnitPrice',4:'CustomerID' })
        X['InvoiceNo'] =  X['InvoiceNo'].astype('int')
        X['Quantity'] = X['Quantity'].astype('float64')
        X['UnitPrice'] = X['UnitPrice'].astype('float64')
        aggregations = dict()
        for col in range(5, X.shape[1]-1) :
            aggregations[col] = 'max'

        aggregations.update({ 'CustomerID' : 'first',
                            'amount' : "sum",'Quantity' : 'mean', 'UnitPrice' : 'mean'})

        # aggregating all basket lines
        result = X.groupby('InvoiceNo').agg(aggregations)

        # add number of lines in the basket
        result['lines_nb'] = X.groupby('InvoiceNo').size()
        return result

        numeric_features = ['InvoiceNo','amount', 'Quantity', 'UnitPrice', 
                           'CustomerID']
        numeric_transformer = Pipeline(steps=[('nothing', nothing())])

        categorical_features = ['StockCode', 'Country']   

        preprocessor =  ColumnTransformer(
        [
        # 'num' transformer does nothing, but is here to  
        # collect numerical columns
        ('num', numeric_transformer ,numeric_features ),
        ('cat', Pipeline([
            ('onehot', OneHotEncoder(handle_unknown='ignore')),
            ('best', TruncatedSVD(n_components=100)),
         ]), categorical_features)        
          ]
          )

# edit with Artem solution
# aggregator = ('agg', Aggregator())

pipe = Pipeline(steps=[
                      ('preprocessor', preprocessor),
                      # edit with Artem solution
                      # ('aggregator', aggregator),
                      ('aggregator', Aggregator())
                     ])

X_train_transformed = pipe.fit_transform(X_train)

Could please add some reproducible example for your issue by using sample data. — Venkatachalam
– Venkatachalam, Commented Jan 25, 2019 at 6:57
Did you try to cut the problem down ? If you return X in Aggregator.transform() do you have an error ? If not, then the problem does not come from the pipeline. — Pierre S.
– Pierre S., Commented Jan 25, 2019 at 7:21
It looks like an element of your pipeline should return an estimator but returned a numpy.ndarray instead. You may want to try running the Aggregator.transform() by itself to see if it returns the expected result. — Pierre S.
– Pierre S., Commented Jan 25, 2019 at 7:35
what do you refer to as step 4? Also I can see at least one problem - in your pipe instantiation, aggreagator is a tuple, while should be a class, I thnk, i.e. try ('aggregator', Aggregator()) — Artem Trunov
– Artem Trunov, Commented Jan 25, 2019 at 10:26
@AI_Learning yes, this is a good advice. Next time I'll make sure to add a reproductibe example — Brigitte Maillère
– Brigitte Maillère, Commented Jan 25, 2019 at 20:52

Artem Trunov · Accepted Answer · 2019-01-25 20:25:11Z

1

Pipeline steps are in from ('name', Class), but original task had essentially:

aggregator = ('agg', Aggregator())`

pipe = Pipeline(steps=[
                      ('preprocessor', preprocessor),
                      ('aggregator', aggregator),
])

which made it ('aggregator', ('agg', Aggregator()))

answered Jan 25, 2019 at 20:25

Artem Trunov

1,43511 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Brigitte Maillère Over a year ago

thanks, I have edited my code as below and the pipeline can now be entirely executed.

pipe = Pipeline(steps=[                       ('preprocessor', preprocessor),                       ('aggregator', Aggregator()), ]

Collectives™ on Stack Overflow

Scikit-learn TransformerMixin : 'numpy.ndarray' object has no attribute 'fit'

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related