2

I've written a python application in a 'broadly' functional way, using frozen dataclasses as the inputs and outputs of functions. These dataclasses typically hold a dataframe, and perhaps another attribute, for example:

@dataclass(frozen=True)
class TimeSeries:
    log: pd.DataFrame
    sourceName: str

I now have more possible data objects, which follow an 'as-a' inheritance structure. So perhaps a TimeSeries has DataFrame with columns only Time and A, and a ExtendedTimeSeries has one with these columns and also a B column, and so on. I now have 4 different TimeSeries which in an OO paradigm would fall into a hierarchy.

What is the best structure for this?

I could use (OO style) composition rather than inheritance, and have the ExtendedTimeSeries data structure contain a TimeSeries object and a standalone Temperature series, but that doesn't seem to be efficient (have to merge before doing df operations) or safe (possibility of mismatched rows).

Without the DataFrames this compositional approach would seem to work ok. Any good design tips?

I could have a series of dataclasses inheriting from each other, but they would have exactly the same variables (in the example above log and sourceName), and I'm not sure that is possible/sensible.

2
  • Your title is about FP, your description is about OOP - which one are you interested in? Commented Nov 3, 2022 at 6:56
  • (Edited for clarity) Good point. I'm trying to write my code in a functional way, but trying to choose the best structure in this case where the data objects are subsets of each other - (which in OO would be done with inheritance) Commented Nov 4, 2022 at 21:20

1 Answer 1

1

In this scenario I would discriminate the cases with a src_type attribute, which then can be used to identify the type of data. This src_type could be automatically determined in a __post_init__ method (circumventing the frozen status) and then used in the functional evaluation.

from enum import Enum
from dataclasses import dataclass

import pandas as pd


# predefined source types for easier discrimination
class SrcType(Enum):
    STANDARD = 0
    EXTENDED = 1


@dataclass(frozen=True)
class TimeSeries:
    log: pd.DataFrame
    src_name: str
    src_type: SrcType = None

    def __post_init__(self):
        # criteria for various source types
        if 'B' in self.log.columns:
            src_type = SrcType.EXTENDED
        else:
            src_type = SrcType.STANDARD
        # bypassing the frozen attribute
        object.__setattr__(self, 'src_type', src_type)


series = TimeSeries(pd.DataFrame(), "my_src")
print(series.src_type)  # <- STANDARD
series = TimeSeries(pd.DataFrame({'B': [0]}), "my_src")
print(series.src_type)  # <- EXTENDED
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Christian, yes, that seems like a very sensible approach. I'm thinking using the order of the enums so I can have functions that work on subsets of them, like methods in inheritance. Thanks again, really helpful. This community is amazing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.