sequential running of multiple functions in python

Question

I have a series of steps (functions) which I need to run on raw dataset to prepare the dataset for modeling. I want to concatenate all the cleaning steps one after the other and want to use each step as functions. It is similar to sklearn Pipeline function however I don't have any fit or transform fucntion.

xx = [2,3,4]
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
    ('double', double(xx)),
    ('triple', triple(xx))
])

predicted = pipeline.fit(xx).predict(xx)

I tried using reduce and lambda function from functools -

from functools import reduce
xx = 4
pipeline = [lambda x: x * 3, lambda x: x + 1, lambda x: x / 2]
val = reduce(lambda x, f: f(x), pipeline, xx)
print(val)

Is there a better way of accomplishing this - making code modular and automated running for multiple datasets. As of now I work on Jupyter notebook. I can always add new functions / modify functions.. without impacting others. Please suggest.

nothing is really bad in your approach. Except I would operate on named functions and not keep them in a mutable list — RomanPerekhrest
– RomanPerekhrest, Commented Nov 12, 2019 at 10:37
1. I agree - I also used named functions. The above example is just to show! I would like to give functions in a pipeline and want to avoid providing the input parameters to each function while creating a pipeline - Any ideas. What is the best way to achieve it? — rishi jain
– rishi jain, Commented Nov 13, 2019 at 4:23

Michael Fengyuan Liu · Accepted Answer · 2019-11-12 10:54:47Z

1

It seems that you could use functions to achieve that, although less fancy, but powerful never the less.

Let's say you have a couple of preprocessing steps, pre_step1, pre_step2, etc. You could define a function called pipeline, and feed the returned value of previous step to the next function within pipeline. Code snippet is as follows:

def preprocessing_step1(rawdata):
  # do something here
  return processed_data

def preprocessing_step2(rawdata):
  # do something here
  return processed_data

def preprocessing_step3(rawdata):
  # do something here
  return processed_data

def pipeline(rawdata):
  # run steps sequentially
  data = preprocessing_step1(rawdata)
  data = preprocessing_step2(data)
  processed_data = preprocessing_step3(data)

  return processed_data

If you find this helpful, I could show you how to go through all datasets that you have using generator function in Python.

answered Nov 12, 2019 at 10:54

Michael Fengyuan Liu

1571 gold badge1 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rishi jain Over a year ago

sounds good, looking for a any other way of creating pipeline if possible. In current pipeline function suggested, we need to give output of 1 layer as input to another .. However, definitely a simple elegant solution. Thanks!

Collectives™ on Stack Overflow

sequential running of multiple functions in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related