How to create a dataframe with different number of levels in multiindexes columns with Pandas?

Question

I create a dataframe this way, with 3 levels of columns:

segment = 's1'

tuples = [(segment, 'mark', 'base'),
          (segment, 'mark', 'quot'),
          (segment, 'mark', 'symb'),
          (segment, 'mark', 'wall'),
          (segment, 'mark', 'type'),
          (segment, 'mark', 'deri'),
          (segment, 'mark', 'marg'),

          (segment, 'type', 'inst'),
          (segment, 'type', 'tran'),
          (segment, 'type', 'prio'),

          (segment, 'trade', 'quantity')
          ]

columns = pd.MultiIndex.from_tuples(tuples, names=["level_1", "level_2", 'level_3'])
df = pd.DataFrame(columns=columns)

I can add a column with only one level but Pandas return an error when I add a new column with two levels. What is the reason for this and how I could do that ?

# Put value in cells
fill_df()

# Increment indexes
df.index = (i for i in range(len(df)))

for index, row in df.iterrows():
    df.loc[index, 'route'] = something  # OK
    df.loc[index, ('route', 'best')] = something  # KeyError: 'route'

df.loc[index, 'route')] = something # OK NOT OK syntaxError, closing parenthesis without opening. — ThePyGuy
– ThePyGuy, Commented Jun 18, 2021 at 19:06

Henry Ecker · Accepted Answer · 2021-06-18 19:20:43Z

This is some built-in functionality that is obscuring the reality of this operation:

df.loc[0, 'route'] = 10

level_1   s1                                                       route
level_2 mark                               type              trade      
level_3 base quot symb wall type deri marg inst tran prio quantity      
0        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN      NaN  10.0

Is not actually an "acceptable" assignment. It works because of this special case being implemented.

Specifically here indexing.py:

def convert_from_missing_indexer_tuple(indexer, axes):
    """
    Create a filtered indexer that doesn't have any missing indexers.
    """

    def get_indexer(_i, _idx):
        return axes[_i].get_loc(_idx["key"]) if isinstance(_idx, dict) else _idx

    return tuple(get_indexer(_i, _idx) for _i, _idx in enumerate(indexer))

Which turns {'key': 0} {'key': 'route'} into (0, slice(11, 12, None))

Take a look at the output of df.columns:

MultiIndex([(   's1',  'mark',     'base'),
            (   's1',  'mark',     'quot'),
            (   's1',  'mark',     'symb'),
            (   's1',  'mark',     'wall'),
            (   's1',  'mark',     'type'),
            (   's1',  'mark',     'deri'),
            (   's1',  'mark',     'marg'),
            (   's1',  'type',     'inst'),
            (   's1',  'type',     'tran'),
            (   's1',  'type',     'prio'),
            (   's1', 'trade', 'quantity'),
            ('route',      '',         '')],
           names=['level_1', 'level_2', 'level_3'])

The explicit syntax is:

df.loc[0, ('route', '', '')] = 10

level_1   s1                                                       route
level_2 mark                               type              trade      
level_3 base quot symb wall type deri marg inst tran prio quantity      
0        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN      NaN  10.0

Adding a single level is such a common operation that there is a built-in alignment check for a single level.

When assigning to more than 1 level explicit syntax is necessary:

df.loc[0, ('route', 'best', '')] = 10

level_1   s1                                                       route
level_2 mark                               type              trade  best
level_3 base quot symb wall type deri marg inst tran prio quantity      
0        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN      NaN  10.0

Collectives™ on Stack Overflow

How to create a dataframe with different number of levels in multiindexes columns with Pandas?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related