I use pandas dataframes to access mass data from an oracle database, which is working fine. One function that I implemented in a class derived from dataframe replaces foreign keys with the data the foreign key points to. Works very nice.
Because the number of rows is quite high in the dataframe (i.e. 100.000.000) I thought it would be nice to implement something that tests whether the foreign keys are already replaced before doing so again and again. So added an attibute to the derived class which is set to True after data has been replaced, so the replace method can test this status and can jump out if it is already True.
class DerivedDataFrame(pandas.DataFrame):
def __init__(self):
self.resolve_status = False
def resolve_data(self, ....):
if not self.resolve_status:
resolve data
self.resolve_status = True
...
The problem is when I use i.e. merge() which doesn't offer to work inplace, instead it creates a new instance from merged data and returns it. The data itself is fine, but because of the new instance self.resolve_status is False again, because the new instance has the value set in init() and doesn't keep the attribute value it had before merging.
Is there any way out? What is a working solution to keep this (and other) attributes with methods that do not allow inplace operation?