How do I use instance attributes with pandas dataframe?

Question

I use pandas dataframes to access mass data from an oracle database, which is working fine. One function that I implemented in a class derived from dataframe replaces foreign keys with the data the foreign key points to. Works very nice.

Because the number of rows is quite high in the dataframe (i.e. 100.000.000) I thought it would be nice to implement something that tests whether the foreign keys are already replaced before doing so again and again. So added an attibute to the derived class which is set to True after data has been replaced, so the replace method can test this status and can jump out if it is already True.

class DerivedDataFrame(pandas.DataFrame):
    def __init__(self):
        self.resolve_status = False

    def resolve_data(self, ....):
        if not self.resolve_status:
            resolve data
            self.resolve_status = True
        ...

The problem is when I use i.e. merge() which doesn't offer to work inplace, instead it creates a new instance from merged data and returns it. The data itself is fine, but because of the new instance self.resolve_status is False again, because the new instance has the value set in init() and doesn't keep the attribute value it had before merging.

Is there any way out? What is a working solution to keep this (and other) attributes with methods that do not allow inplace operation?

Have you considered writing your own method inside your class which does what you need? That should be straightforward, no? — John Zwinck
– John Zwinck, Commented Jun 3, 2016 at 13:05
@John Zwinck: Not sure what you mean. Of course do I have methods where I can (as most likely have to) copy the attriutes over to the new instance, but somehow I hoped that pandas would allow that in a better way. Seems it does not. Open question would be: Why doesn't merge support to work inplace. — mstuebner
– mstuebner, Commented Jun 6, 2016 at 8:56

piRSquared · Accepted Answer · 2016-06-03 14:50:10Z

1

Track the new object in a variable. Update your object's __dict__ attribute with the new objects __dict__ attrubute.

The __dict__ attribute is a dictionary of all attributes and methods. This is how I've done something similar in the past.

    def resolve_data(self):
        if not self.resolve_status:
            self.resolve_status = True
            new = self.copy()  # just an example, replace with your own
            self.__dict__.update(new.__dict__)

Demonstration

import pandas as pd


class SubDF(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super(SubDF, self).__init__(*args, **kwargs)
        self.resolve_status = False

    def resolve_data(self):
        if not self.resolve_status:
            self.resolve_status = True
            new = self.copy() * 3
            self.__dict__.update(new.__dict__)


sdf = SubDF([1, 2, 4])

print sdf.resolve_status
print
print sdf
print
sdf.resolve_data()
print sdf.resolve_status
print
print sdf

False

   0
0  1
1  2
2  4

True

    0
0   3
1   6
2  12

answered Jun 3, 2016 at 14:50

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mstuebner Over a year ago

That means just to copy the attributes over to the new data frame instance, if I understood right? Thats what I'm doing, but I hoped for a better solution. But then "self.__dict__.update(new.__dict__)" should be "new.__dict__.update(self.__dict__)", right?

Collectives™ on Stack Overflow

How do I use instance attributes with pandas dataframe?

1 Answer 1

Demonstration

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Demonstration

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related