1

When creating a DataFrame with MultiIndex columns it seems not possible to return a single column with a MultiIndex. Instead, an object with an Index is returned:

import pandas as pd
import numpy as np

dates = np.asarray(pd.date_range('1/1/2000', periods=8))
_metaInfo = pd.MultiIndex.from_tuples([('AA', '[m]'), ('BB', '[m]'), ('CC', '[s]'), ('DD', '[s]')], names=['parameter','unit'])

df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=_metaInfo)
print df.get('AA').columns
# Index([[m]], dtype=object)

where the 'parameter' info is missing. Is this a bug, is there a workaround?

2
  • Do you mean to say it doesn't have a name attribute (of 'AA')? Commented Nov 26, 2012 at 23:14
  • No, you loose a lvel of the MultiIndex (in this case the name) Commented Nov 27, 2012 at 10:17

1 Answer 1

1

I have struggled with this as well. The opposite, adding an extra level to a single (so it matches a MultiIndex), also keeps me busy.

I sometimes use this to keep the index intact:

print df.T[[('AA', '[m]') == col for col in df.columns]].T

parameter         AA
unit             [m]
2000-01-01  0.972434
2000-01-02 -0.581852
2000-01-03 -0.784172
2000-01-04 -0.843441
2000-01-05 -1.030200
2000-01-06 -0.864225
2000-01-07 -0.530056
2000-01-08 -0.651367

But thats not the most flexible solution when your Index is more complex. In this example it would work.

Sign up to request clarification or add additional context in comments.

2 Comments

There seems to be an inconsistency between MultiIndex rows and MultiIndex columns. Using dates = np.asarray(pd.date_range('1/1/2000', periods=4)) _metaInfo = pd.MultiIndex.from_tuples([('AA', '[m]'), ('BB', '[m]'), ('CC', '[s]'), ('DD', '[s]')], names=['parameter','unit']) df = pd.DataFrame(np.random.randn(4, 4), index=_metaInfo, columns=dates)
There seems to be an inconsistency between MultiIndex rows and MultiIndex columns. Using the transpose allows you to select rows as df["AA":"AA"] which then return a MultiIndex DataFrame (not losing information), however, df.xs("AA", axis=1) returns a DataFrmae with a single level Index (thus losing information). In addition to this, when I define a single level (Index) DataFrame with columns AA and BB then df[df["AA"]>0] will give me all the rows of columns AA and BB where the element in AA is greater than 0.0. However, if I do the same in a MultiIndex column DataFrame, then I get a crash.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.