2

I use pandas dataframes to access mass data from an oracle database, which is working fine. One function that I implemented in a class derived from dataframe replaces foreign keys with the data the foreign key points to. Works very nice.

Because the number of rows is quite high in the dataframe (i.e. 100.000.000) I thought it would be nice to implement something that tests whether the foreign keys are already replaced before doing so again and again. So added an attibute to the derived class which is set to True after data has been replaced, so the replace method can test this status and can jump out if it is already True.

class DerivedDataFrame(pandas.DataFrame):
    def __init__(self):
        self.resolve_status = False

    def resolve_data(self, ....):
        if not self.resolve_status:
            resolve data
            self.resolve_status = True
        ...

The problem is when I use i.e. merge() which doesn't offer to work inplace, instead it creates a new instance from merged data and returns it. The data itself is fine, but because of the new instance self.resolve_status is False again, because the new instance has the value set in init() and doesn't keep the attribute value it had before merging.

Is there any way out? What is a working solution to keep this (and other) attributes with methods that do not allow inplace operation?

2
  • Have you considered writing your own method inside your class which does what you need? That should be straightforward, no? Commented Jun 3, 2016 at 13:05
  • @John Zwinck: Not sure what you mean. Of course do I have methods where I can (as most likely have to) copy the attriutes over to the new instance, but somehow I hoped that pandas would allow that in a better way. Seems it does not. Open question would be: Why doesn't merge support to work inplace. Commented Jun 6, 2016 at 8:56

1 Answer 1

1

Track the new object in a variable. Update your object's __dict__ attribute with the new objects __dict__ attrubute.

The __dict__ attribute is a dictionary of all attributes and methods. This is how I've done something similar in the past.

    def resolve_data(self):
        if not self.resolve_status:
            self.resolve_status = True
            new = self.copy()  # just an example, replace with your own
            self.__dict__.update(new.__dict__)

Demonstration

import pandas as pd


class SubDF(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super(SubDF, self).__init__(*args, **kwargs)
        self.resolve_status = False

    def resolve_data(self):
        if not self.resolve_status:
            self.resolve_status = True
            new = self.copy() * 3
            self.__dict__.update(new.__dict__)


sdf = SubDF([1, 2, 4])

print sdf.resolve_status
print
print sdf
print
sdf.resolve_data()
print sdf.resolve_status
print
print sdf

False

   0
0  1
1  2
2  4

True

    0
0   3
1   6
2  12
Sign up to request clarification or add additional context in comments.

1 Comment

That means just to copy the attributes over to the new data frame instance, if I understood right? Thats what I'm doing, but I hoped for a better solution. But then "self.__dict__.update(new.__dict__)" should be "new.__dict__.update(self.__dict__)", right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.