1

I am python beginner and I am currently learning OOP. I want to create a class which would load data from a database and then perform transformation and calculation on this data. However, I am not sure how the class should be designed. The class would have several methods and each method input is the output of the former method. However, I find this way of doing not really elegant and I feel like it is not in the spirit of OOP.

Is this a good way to design such process using OOP ? Or is it even advantageous in a such case to create a class to process the data ?

See below what I tried to do:

class Data:
      def __init__(self, arg):
          self.arg = arg (attributes necessary for the loading of data)
          
      def load_data(self):
          "load the data from somewhere and reshape it so that it can be used by method1"
          return(data)

      def method1(self, data):
          "do some transformation on the data"
          return(data1)

      def method2(self, data1):
          "do some transformation on the data1"
          return(data2)

      def method3(self, data2):
           "do some transformation on the data2"
          return(data3)

      def run(self):
           data = self.load()
           data1 = self.method1(data)
           data2 = self.method2(data1)
           data3 = self.method3(data2)
           return(data3)

Thanks in advance for your help.

1
  • It depends on what you are achieving. Different programming philosophies might suggest that your code should be written in a certain way. Your code is perfectly acceptable for the UNIX philosophy (one function does 1 thing and does it well, and expect the output of one code to be the input of another). I think this at the end of the day is really personal preference. en.wikipedia.org/wiki/Unix_philosophy Commented Nov 8, 2020 at 15:12

1 Answer 1

1

There are various technologies(Python APIs) (Pandas, PySpark etc.) that utilize data processing operations which are VERY optimized by using sophisticated techniques (using C++ code and vectorization etc.) underneath which would give extremely faster results(sometimes tens of thousand folds faster) than default Python code.

Even though I can see your point and it makes sense, you will possibly never need a pure Python class like this. Think it like some other people already did what you mentioned and did a reaallly good job so you don't need to care about it anymore.

However, you can define your own methods and classes using these APIs that would perform certain transformation/manipulations on the dataset and still exploit their optimized processing speeds but these functions would probably do very specific jobs otherwise the general manipulations are already defined in the APIs that I've talked about.

To the people who are more experienced/knowledgeable than me please let me know if I gave any misinformation so I could improve myself too.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.