1

I have upwards of 4000 lines of code that analyze, manipulate, compare and plot 2 huge .csv documents. For readability and future publication, I'd like to convert to object-oriented classes. I convert them to pd.DataFrames:

my_data1 = pd.DataFrame(np.random.randn(100, 9), columns=list('123456789'))
my_data2 = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

I have functions that compare various aspects of each of the datasets and functions that only use the datasets individually. I want to convert this structure into a dataclass with methods for each dataframe.

I can't manipulate these dataframes through my class functions. I keep getting NameError: name 'self' is not defined. Here's my dataclass structure:

@dataclass
class Data:
    ser = pd.DataFrame 

    # def __post_init__(self):
    #     self.ser = self.clean()

    def clean(self, ser):
        acceptcols = np.where(ser.loc[0, :] == '2')[0]
        data = ser.iloc[:, np.insert(acceptcols, 0, 0)]
        data = ser.drop(0)
        data = ser.rename(columns={'': 'Time(s)'})
        data = ser.astype(float)
        data = ser.reset_index(drop=True)
        data.columns = [column.replace('1', '')
                        for column in ser.columns]

        return data


my_data1 = pd.DataFrame(np.random.randn(100, 9), columns=list('123456789'))
my_data2 = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

# Attempt 1
new_data1 = Data.clean(my_data1) # Parameter "ser" unfilled 
# Attempt 2
new_data1 = Data.clean(ser=my_data1) # Parameter "self" unfilled 
# Attempt 3
new_data1 = Data.clean(self, my_data1) # Unresolved reference "self"

I have tried various forms of defining def clean(self and other stuff) but I think I just don't understand classes or class structure enough. Documentation on classes and dataclasses always use very rudimentary examples, I've tried cut/pasting a template to no avail. What am I missing?

1 Answer 1

2

you can first get an instance x of the class Data.

x = Data()

# Attempt 1
new_data1 = x.clean(my_data1) # Parameter "ser" unfilled 
# Attempt 2
new_data1 = x.clean(ser=my_data1) # Parameter "self" unfilled 

If I were you I would not use a class this way, I would instead just define the following function

def clean(ser):
        acceptcols = np.where(ser.loc[0, :] == '2')[0]
        data = ser.iloc[:, np.insert(acceptcols, 0, 0)]
        data = ser.drop(0)
        data = ser.rename(columns={'': 'Time(s)'})
        data = ser.astype(float)
        data = ser.reset_index(drop=True)
        data.columns = [column.replace('1', '')
                        for column in ser.columns]

        return data

and call it directly.

Also, in your clean(), each modification is based on ser which is the input, but not the last modification. This is a problem, isn't this?

Sign up to request clarification or add additional context in comments.

7 Comments

The last modification is (or should be) a modification on the input, ser, it's goal is to replace every column with a string '1' with an empty string. And at first, it was just a function like you included above, but the dataset got very complex and I wanted to test functions in classes to try to make everything more readable. Why do you think I shouldn't use classes ike this?
#1 even if you just define a function clean, you can call it as many times as you want so the code is still readable. A class is better only if it represents a object that has many methods and many attributes. In your case, a function is sufficient.
#2 you have 6 modification in clean but only the last 2 will appear in your final outcome. If you want all modifications back, please change all ser into data EXCEPT for the 1st and the 2nd one.
If I were you I would still not use a class. A class represents an object, and in your case you only want to group your functions so this is the difference. I would choose one of the following 2 ways. #1 Define all functions as just functions, and call them wherever needed. #2 Put functions specific to my_data1 in one .py file; and the same for my_data2; and lastly put the shared function into the 3rd .py file.
And finally I can define a wrapper function for my_data1 that will call all the 20 functions, and a wrapper function for my_data2 that will call all the 15 functions.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.