1

I tried to transform the column 'X' using values in column 'y' (this is a toy example, just to show using y for transformation) before fitted by the last linear regression estimator. But why df['y'] is not passed to MyTransformer?

from sklearn.base import TransformerMixin
class MyTransformer(TransformerMixin):
    def __init__(self):
        pass
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        print(y)
        return X + np.sum(y)

df = pd.DataFrame(np.array([[2, 3], [1, 5], [1, 1], [5, 6], [1, 2]]), columns=['X', 'y'])
pip =  Pipeline([('my_transformer', MyTransformer()), 
             ('sqrt', FunctionTransformer(np.sqrt, validate=False)),
             ('lr', LinearRegression())])
pip.fit(df[['X']], df['y'])

Running this script will raise an error at line return X + np.sum(y), looks like y is None.

2 Answers 2

1

As stated previously, the fit_transform method doesn't pass y off to transform. What I've done previously is implement my own fit_transform. Not your code, but here's an example I wrote recently:

class MultiColumnLabelEncoder:
    def __init__(self, *args, **kwargs):
        self.encoder = StandardLabelEncoder(*args, **kwargs)
    def fit(self, X, y=None):
        return self
    def transform(self,X):
        data = X.copy()
        for i in range(data.shape[1]):
            data[:, i] = LabelEncoder().fit_transform(data[:, i])
        return data
    def fit_transform(self, X, y=None):
        return self.fit(X, y).transform(X)

There are other ways. You could have y as a class param and access it in the transform method.

Edit: I should note that you can pass y off to your version of transform. So:

def fit_transform(self, X, y=None):
    return self.fit(X, y).transform(X, y)
Sign up to request clarification or add additional context in comments.

2 Comments

So basically just use Python's duck typing to bypass TransformerMixin, and everything should be fine, right?
You can also extend TransformerMixin, but don't have to. If you do, you get the fit_transform method. The key is really just to override it if you do extend it. Here's the source for that method in sklearn, btw: github.com/scikit-learn/scikit-learn/blob/…
0

The following statement in TransformerMixin will execute ,We can see that transform function only need X parameter

self.fit(X, y, **fit_params).transform(X)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.