0

I create a dataframe this way, with 3 levels of columns:

segment = 's1'

tuples = [(segment, 'mark', 'base'),
          (segment, 'mark', 'quot'),
          (segment, 'mark', 'symb'),
          (segment, 'mark', 'wall'),
          (segment, 'mark', 'type'),
          (segment, 'mark', 'deri'),
          (segment, 'mark', 'marg'),

          (segment, 'type', 'inst'),
          (segment, 'type', 'tran'),
          (segment, 'type', 'prio'),

          (segment, 'trade', 'quantity')
          ]

columns = pd.MultiIndex.from_tuples(tuples, names=["level_1", "level_2", 'level_3'])
df = pd.DataFrame(columns=columns)

I can add a column with only one level but Pandas return an error when I add a new column with two levels. What is the reason for this and how I could do that ?

# Put value in cells
fill_df()

# Increment indexes
df.index = (i for i in range(len(df)))

for index, row in df.iterrows():
    df.loc[index, 'route'] = something  # OK
    df.loc[index, ('route', 'best')] = something  # KeyError: 'route'
3
  • 1
    df.loc[index, 'route')] = something # OK NOT OK syntaxError, closing parenthesis without opening. Commented Jun 18, 2021 at 19:06
  • 1
    sorry, it was a typo error Commented Jun 18, 2021 at 19:11
  • @anky s1 is a string Commented Jun 18, 2021 at 19:18

1 Answer 1

2

This is some built-in functionality that is obscuring the reality of this operation:

df.loc[0, 'route'] = 10
level_1   s1                                                       route
level_2 mark                               type              trade      
level_3 base quot symb wall type deri marg inst tran prio quantity      
0        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN      NaN  10.0

Is not actually an "acceptable" assignment. It works because of this special case being implemented.

Specifically here indexing.py:

def convert_from_missing_indexer_tuple(indexer, axes):
    """
    Create a filtered indexer that doesn't have any missing indexers.
    """

    def get_indexer(_i, _idx):
        return axes[_i].get_loc(_idx["key"]) if isinstance(_idx, dict) else _idx

    return tuple(get_indexer(_i, _idx) for _i, _idx in enumerate(indexer))

Which turns {'key': 0} {'key': 'route'} into (0, slice(11, 12, None))

Take a look at the output of df.columns:

MultiIndex([(   's1',  'mark',     'base'),
            (   's1',  'mark',     'quot'),
            (   's1',  'mark',     'symb'),
            (   's1',  'mark',     'wall'),
            (   's1',  'mark',     'type'),
            (   's1',  'mark',     'deri'),
            (   's1',  'mark',     'marg'),
            (   's1',  'type',     'inst'),
            (   's1',  'type',     'tran'),
            (   's1',  'type',     'prio'),
            (   's1', 'trade', 'quantity'),
            ('route',      '',         '')],
           names=['level_1', 'level_2', 'level_3'])

The explicit syntax is:

df.loc[0, ('route', '', '')] = 10
level_1   s1                                                       route
level_2 mark                               type              trade      
level_3 base quot symb wall type deri marg inst tran prio quantity      
0        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN      NaN  10.0

Adding a single level is such a common operation that there is a built-in alignment check for a single level.

When assigning to more than 1 level explicit syntax is necessary:

df.loc[0, ('route', 'best', '')] = 10
level_1   s1                                                       route
level_2 mark                               type              trade  best
level_3 base quot symb wall type deri marg inst tran prio quantity      
0        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN      NaN  10.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.