20

I'm starting with a dictionary like this:

dict = {(100000550L, u'ActivityA'): {'bar__sum': 14.0, 'foo__sum': 12.0},
        (100001799L, u'ActivityB'): {'bar__sum': 7.0, 'foo__sum': 3.0}}

Which, when converted to a DataFrame, puts as column headers the tuples of (id, activitytype):

df = DataFrame(dict).transpose()

                        bar__sum  foo__sum
(100000550, ActivityA)        14        12
(100001799, ActivityB)         7         3

How can I convert the tuples in the index to a MultiIndex? Ie, so that the end result looks like this instead:

                        bar__sum  foo__sum
id        act_type
100000550 ActivityA        14        12
100001799 ActivityB         7         3

What's the best way to do this? Is there some option on the DataFrame creation that I'm missing? Or should it happen via a list comprehension, which feels inefficient to me.

1 Answer 1

29

If you want to convert index of your dataframe:

>>> df.index = pd.MultiIndex.from_tuples(df.index)
>>> df
                     bar__sum  foo__sum
100000550 ActivityA        14        12
100001799 ActivityB         7         3

>>> df.index.names = ['id', 'act_type']
>>> df
                     bar__sum  foo__sum
id        act_type                     
100000550 ActivityA        14        12
100001799 ActivityB         7         3

You can also create DataFrame directly from dictionary (d is your dict, don't call your variable dict since it'll shadow standard python dictionary):

>>> pd.DataFrame(d.values(), index=pd.MultiIndex.from_tuples(d.keys(), names=['id', 'act_type']))
                     bar__sum  foo__sum
id        act_type                     
100001799 ActivityB         7         3
100000550 ActivityA        14        12

Note that values() and keys() are always in the same order, so no worries about that.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice trick passing only d.values() as the argument! I was trying to figure out something to get access to the post-sorted index after passing d, but this way you don't need it at all.
using Python 3.6 and pandas 0.23.1 d.values() isn't an acceptable data type to create the dataframe. If you cast d.values to a list it fixes the issue. pd.DataFrame(list(d.values()), index=pd.MultiIndex.from_tuples(d.keys(), names=['id', 'act_type'])) should do the trick

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.