0

I have a data engineering program that is grabbing some data off of Federal government websites and transforming that data. I'm a bit confused on whether I need to use the 'self' keyword or if it's a better practice to not use a class at all. This is how it's currently organized:

class GetGovtData():

    def get_data_1(arg1=0, arg2=1):
       df = conduct_some_operations
       return df

    def get_data_2(arg1=4, arg2=5):
       df = conduct_some_operations_two
       return df

I'm mostly using a class here for organization purposes. For instance, there might be a dozen different methods from one class that I need to use. I find it more aesthetically pleasing / easier to type out this:

from data.get_govt_data import GetGovtData

df1 = GetGovtData.get_data_1()
df2 = GetGovtData.get_data_2()

Rather than:

from data import get_govt_data

df1 = get_govt_data.get_data_1()
df2 = get_govt_data.get_data_2()

Which just has a boatload of underscores. So I'm just curious if this would be considered bad code to use a class like this, without bothering with 'self'? Or should I just eliminate the classes and use a bunch of functions in my files instead?

7
  • 2
    Lots of syntax errors here. Are those supposed to be methods? They don't appear to be inside a class. Were they meant to be indented? Also, Class is not a Python keyword. Did you mean class? If those two functions are in fact meant to be class methods, or static methods, then use the appropriate decorator to flag them as such. A class method wants to know the class, but doesn't require an instance of the class. A static method is just a function defined inside the class. Commented Oct 15, 2020 at 6:46
  • 1
    you can still achieve the same without using a class and here you shouldn't be using a class. Rename your file to GetGovtData and just import it this way from data import GetGovtData(but without class) and then the way you call functions will still hold. Commented Oct 15, 2020 at 6:46
  • 1
    No, you absolutely should not use classes this way. And note, I don't think anyone has insulted your intelligence. What has been pointed out is that you aren't using classes the way they are meant to be used. And it would be helpful if you used a good tutorial, e.g. the official tutorial because StackOverflow cannot substitute for that. If you are using classes like this, you should at least make the methods staticmethods. My advice: take comments at face value, they are usually meant as advice, and not as gratuitous insults. Commented Oct 15, 2020 at 7:06
  • 1
    Thanks Ragnar! I am also quite new to SO, but feel the same way as you. Commented Oct 15, 2020 at 7:06
  • 1
    @RagnarLothbrok using snake_case for file names is a convention, and honestly, if that is your main concern for not using a module (which would be the intended use-case, to organize your code, especially functions and classes) then simply breaking that convention and using CapitalCase would probably be the least bad solution Commented Oct 15, 2020 at 7:08

3 Answers 3

6

If you develop functions within a Python class you can two ways of defining a function: The one with a self as first parameter and the other one without self.

So, what is the different between the two?

Function with self

The first one is a method, which is able to access content within the created object. This allows you to access the internal state of an individual object, e.g., a counter of some sorts. These are methods you usually use when using object oriented programming. A short intro can be fund here [External Link]. These methods require you to create new instances of the given class.

Function without self

Functions without initialising an instance of the class. This is why you can directly call them on the imported class.

Alternative solution

This is based on the comment of Tom K. Instead of using self, you can also use the decorator @staticmethod to indicate the role of the method within your class. Some more info can be found here [External link].

Final thought

To answer you initial question: You do not need to use self. In your case you do not need self, because you do not share the internal state of an object. Nevertheless, if you are using classes you should think about an object oriented design.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Tom for the input. I included it in my answer.
Thanks! This is a great answer. So essentially, what I was trying to do was create static methods and I could've just used the @staticmethod decorator? I think I'll probably just change the design to avoid classes in this instance (as I don't think it's necessary here), but this is very useful.
@RagnarLothbrok The @staticmethod decorator is for methods that do not require a class instance, or even a class. They're just functions that have been placed inside a class. You can invoke them through either the class name or a class instance and the decorator will make sure there is no self argument. There are also class methods, indicated by the decorator @classmethod. In this case, the first argument is a class rather than a class instance, and is usually called cls. This may be the class itself, or it may be any class that uses it as a base class (directly or indirectly).
2

I suppose you have a file called data/get_govt_data.py that contains your first code block. You can just rename that file to data/GetGovtData.py, remove the class line and not bother with classes at all, if you like. Then you can do

from data import GetGovtData

df1 = GetGovtData.get_data_1()

Depending on your setup you may need to create an empty file data/__init__.py for Python to see data as a module.

EDIT: Regarding the file naming, Python does not impose any too tight restrictions here. Note however that many projects conventionally use camelCase or CapitalCase to distinguish function, class and module names. Using CapitalCase for a module may confuse others for a second to assume it's a class. You may choose not to follow this convention if you do not want to use classes in your project.

2 Comments

This is what I wanted to do, but my issue is this: aren't there issues with naming .py files using camelCase or CapitalCase? stackoverflow.com/a/42127721/8309944
The case of .py files does not cause issues. It's just convention. See edit.
1

To answer the question in the title first: The exact string 'self' is a convention (that I can see no valid reason to ignore BTW), but the first argument in a class method is always going to be a reference to the class instance.

Whether you should use a class or flat functions depends on if the functions have shared state. From your scenario it sounds like they may have a common base URL, authentication data, database names, etc. Maybe you even need to establish a connection first? All those would be best held in the class and then used in the functions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.