Python class design for data processing

Question

I am python beginner and I am currently learning OOP. I want to create a class which would load data from a database and then perform transformation and calculation on this data. However, I am not sure how the class should be designed. The class would have several methods and each method input is the output of the former method. However, I find this way of doing not really elegant and I feel like it is not in the spirit of OOP.

Is this a good way to design such process using OOP ? Or is it even advantageous in a such case to create a class to process the data ?

See below what I tried to do:

class Data:
      def __init__(self, arg):
          self.arg = arg (attributes necessary for the loading of data)
          
      def load_data(self):
          "load the data from somewhere and reshape it so that it can be used by method1"
          return(data)

      def method1(self, data):
          "do some transformation on the data"
          return(data1)

      def method2(self, data1):
          "do some transformation on the data1"
          return(data2)

      def method3(self, data2):
           "do some transformation on the data2"
          return(data3)

      def run(self):
           data = self.load()
           data1 = self.method1(data)
           data2 = self.method2(data1)
           data3 = self.method3(data2)
           return(data3)

Thanks in advance for your help.

It depends on what you are achieving. Different programming philosophies might suggest that your code should be written in a certain way. Your code is perfectly acceptable for the UNIX philosophy (one function does 1 thing and does it well, and expect the output of one code to be the input of another). I think this at the end of the day is really personal preference. en.wikipedia.org/wiki/Unix_philosophy — Timothy Wong
– Timothy Wong, Commented Nov 8, 2020 at 15:12

rocketsfallonrocketfalls · Accepted Answer · 2020-11-08 15:23:20Z

There are various technologies(Python APIs) (Pandas, PySpark etc.) that utilize data processing operations which are VERY optimized by using sophisticated techniques (using C++ code and vectorization etc.) underneath which would give extremely faster results(sometimes tens of thousand folds faster) than default Python code.

Even though I can see your point and it makes sense, you will possibly never need a pure Python class like this. Think it like some other people already did what you mentioned and did a reaallly good job so you don't need to care about it anymore.

However, you can define your own methods and classes using these APIs that would perform certain transformation/manipulations on the dataset and still exploit their optimized processing speeds but these functions would probably do very specific jobs otherwise the general manipulations are already defined in the APIs that I've talked about.

To the people who are more experienced/knowledgeable than me please let me know if I gave any misinformation so I could improve myself too.

Collectives™ on Stack Overflow

Python class design for data processing

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related