I have a series of steps (functions) which I need to run on raw dataset to prepare the dataset for modeling. I want to concatenate all the cleaning steps one after the other and want to use each step as functions. It is similar to sklearn Pipeline function however I don't have any fit or transform fucntion.
xx = [2,3,4]
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('double', double(xx)),
('triple', triple(xx))
])
predicted = pipeline.fit(xx).predict(xx)
I tried using reduce and lambda function from functools -
from functools import reduce
xx = 4
pipeline = [lambda x: x * 3, lambda x: x + 1, lambda x: x / 2]
val = reduce(lambda x, f: f(x), pipeline, xx)
print(val)
Is there a better way of accomplishing this - making code modular and automated running for multiple datasets. As of now I work on Jupyter notebook. I can always add new functions / modify functions.. without impacting others. Please suggest.