5

In Pandas, I've been using custom objects as column labels because they provide rich/flexible functionality for info/methods specific to the column. For example, you can set a custom fmt_fn to format each column (note this is just an example, my actual column label objects are more complex):

In [100]: class Col:
     ...:     def __init__(self, name, fmt_fn):
     ...:         self.name = name
     ...:         self.fmt_fn = fmt_fn
     ...:     def __str__(self):
     ...:         return self.name
     ...:     

In [101]: sec_col = Col('time', lambda val: str(timedelta(seconds=val)).split('.')[0])

In [102]: dollar_col = Col('money', lambda val: '${:.2f}'.format(val))

In [103]: foo = pd.DataFrame(np.random.random((3, 2)) * 1000, columns = [sec_col, dollar_col])

In [104]: print(foo)  # ugly
         time       money
0  773.181402  720.997051
1   33.779925  317.957813
2  590.750129  416.293245

In [105]: print(foo.to_string(formatters = [col.fmt_fn for col in foo.columns]))  # pretty
     time   money
0 0:12:53 $721.00
1 0:00:33 $317.96
2 0:09:50 $416.29

Okay, so I've been happily doing this for a while, but then I recently came across one part of Pandas that doesn't support this. Specifically, methods to_hdf/read_hdf will fail on DataFrames with custom column labels. This is not a dealbreaker for me. I can use pickle instead of HDF5 at the loss of some efficiency.

But the bigger question is, does Pandas in general support custom objects as column labels? In other words, should I continue to use Pandas this way, or will this break in other parts of Pandas (besides HDF5) in the future, causing me future pain?

PS. As a side note, I wouldn't mind if you also chime in on how you solve the problem of column-specific info such as the fmt_fn in the example above, if you're not currently using custom objects as column labels.

3
  • Interesting question, as I've never seen objects passed as columns in a DataFrame. I would recommend against this usage. If you need the flexibility, you can keep a dictionary of column names and underlying objects. Commented Sep 1, 2015 at 21:18
  • It would be bad design (IMO) to maintain a separate data structure per DataFrame that's parallel to foo.columns rather than simply put the column-specific data into foo.columns. I would only do so if necessary, i.e. if Pandas really does not support custom objects as column labels. Hence I posted this question. Commented Sep 1, 2015 at 22:44
  • The columns of a dataframe are just an Index. It appears that the only requirement is that the objects are hashable. pandas.pydata.org/pandas-docs/stable/generated/… Commented Sep 1, 2015 at 22:50

2 Answers 2

2

Fine-grained control of formatting of a DataFrame isn't really possible right now. E.g., see here or here for some discussion of possibilities. I'm sure a well thought out API (and PR!) would be well received.

In terms of using custom objects as columns, the two biggest issues are probably serialization, and indexing semantics (e.g. can no longer do df['time']).

One possible work-around would be to wrap your DataFrame is some kind of pretty-print structure, like this:

In [174]: class PrettyDF(object):
     ...:     def __init__(self, data, formatters):
     ...:         self.data = data
     ...:         self.formatters = formatters
     ...:     def __str__(self):
     ...:         return self.data.to_string(formatters=self.formatters)
     ...:     def __repr__(self):
     ...:         return self.__str__()


In [172]: foo = PrettyDF(df, 
                        formatters={'money': '${:.2f}'.format, 
                                    'time': lambda val: str(timedelta(seconds=val)).split('.')[0]})


In [178]: foo
Out[178]: 
     time   money
0 0:13:17 $399.29
1 0:08:48 $122.44
2 0:07:42 $491.72

In [180]: foo.data['time']
Out[180]: 
0    797.699511
1    528.155876
2    462.999224
Name: time, dtype: float64
Sign up to request clarification or add additional context in comments.

3 Comments

As I noted in my question post, the fmt_fn is just for example of column-specific data. My actual column label objects are much more complex, providing much richer functionality than output formatting.
As far as the "two biggest issues" you listed: (a) serialization is the one issue that I did run into, that prompted me to write this question -- hopefully I can handle it with pickle. (b) "can no longer do df['time']" would not be considered an issue in my book because 'time' is not the column label object (merely the printed representation of it) -- the correct code is df[sec_col] and that works correctly as expected. Given your comments and Alexander's comment above, I think my current conclusion is that it's safe to continue using custom objects in column labels. Thanks!
I tried the same thing with __str__, but it does not work for MultiIndexed dataframes. Do you have solution for that as well? stackoverflow.com/questions/49563981/…
0

It's been five years since this was posted, so i hope this is still helpfull to someone. I've managed to build an object to hold metadata for a pandas dataframe column but still be accessable as a regular column (or so it seems to me). The code below is just the part of the whole class that involves this.

__repr is for presenting the name of the object if the dataframe is printed instead of the object

__eq is for checking the requested name to the available name of the objects __hash is also used in this process Column-names need to be hashable as it works the similar to a dictionary.

Thats probably not pythonic way of descibing it, but seems to me like thats the way it works.

    class ColumnDescriptor:
        def __init__(self, name, **kwargs):
            self.name = name
            [self.__setattr__(n, v) for n, v in kwargs.items()]
    
        def __repr__(self): return self.name
        def __str__(self): return self.name
        def __eq__(self, other): return self.name == other
        def __hash__(self): return hash(self.name)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.