1

I'm currently on some heavy data analytics projects, and am trying to create a Python wrapper class to help streamline a lot of the mundane preprocessing steps involved when cleaning data, partitioning it into test / validation sets, standardizing it, etc. The idea ultimately is to transform raw data into easily consumable processed matrices for machine learning algorithms to input for training and testing purposes. Ideally, I'm working towards the point where

data = DataModel(AbstractDataModel)
processed_data = data.execute_pipeline(**kwargs)

So in many cases I'll start off with a self.df, which is a pandas dataframe object for my instance. But one method may be called standardize_data() and will ultimately return a standardized dataframe called self.std_df.

My IDE has been complaining heavily about me initializing variables outside of __init__. So to try to soothe PyCharm, I've been using the following code inside my constructor:

class AbstractDataModel(ABC):

    @abstractmethod
    def __init__(self, input_path, ...,  **kwargs):

        self.df_train, self.df_test, self.train_ID, self.test_ID, self.primary_key, ... (many more variables) = None, None, None, None, None, ...

Later on, these properties are being initialized and set. I'll admit that I'm coming from heavy-duty Java Spring projects, so I'm still used to verbosely declaring variables. Is there a more Pythonic way of declaring my instance properties here? I know I must be violating DRY with all the None values.

I've researched on SO, and came across this similar question, but the answer that is provided is more about setting instance variables through argv, so it isn't a direct solution in my context.

5
  • @Alexander I've attempted this solution. However, Pycharm complains that it needs more values to unpack, and at runtime, I get a TypeError: 'NoneType' object is not iterable error. Commented Aug 6, 2017 at 20:53
  • 1
    @Alexander, that would result in an error (NoneType not iterable), maybe you meant var1 = var2 = ... = varN = None Commented Aug 6, 2017 at 20:53
  • 1
    My mistake, self.var1 = self.var2 = ... = self.var_n = None Commented Aug 6, 2017 at 20:54
  • 1
    As an aside, when you have lots of variables inside a class instance, you may want to alphabetize them and declare each individually. I find that it helps me to maintain code. Commented Aug 6, 2017 at 20:58
  • @Alexander that's a great idea. Commented Aug 6, 2017 at 20:59

1 Answer 1

2

Use chained assignment:

self.df_train = self.df_test = self.train_ID = self.test_ID = self.primary_key = ... = None

Or set up abstract properties that default to None (So you don't have to set them)

Sign up to request clarification or add additional context in comments.

3 Comments

This is such a good idea, and so simple! Definitely Pythonic. Speaking of the abstract properties that default to None, could you clarify how that is different than what I'm doing in __init__? Since my constructor is an abstract method and all my properties are being set to None within this method?
None holds its own location in memory (id(None)). Anything that has a value equal to None will point to this memory location. Both methods are equivalent in their functionality, although one is much simpler.
Got it. So self.df_train is self.df_test will evaluate to true until I assign it values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.