Pandas dataframe with multiindex column - merge levels

Question

I have a dataframe, grouped, with multiindex columns as below:

import pandas as pd
import numpy as np
import random

codes = ["one","two","three"];
colours = ["black", "white"];
textures = ["soft", "hard"];
N= 100 # length of the dataframe
df = pd.DataFrame({ 'id' : range(1,N+1),
                    'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)],
                    'code' : [random.choice(codes) for i in range(1,N+1)],
                    'colour': [random.choice(colours) for i in range(1,N+1)],
                    'texture': [random.choice(textures) for i in range(1,N+1)],
                    'size': [random.randint(1,100) for i in range(1,N+1)],
                    'scaled_size': [random.randint(100,1000) for i in range(1,N+1)]
                   },  columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size'])
grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index()

>> grouped
    code colour     size                           scaled_size                         
                    sum    average  size  idxmax            sum    average  size  idxmax
0    one  black    1031  60.647059    17      81     185.153944  10.891408    17      47
1    one  white     481  37.000000    13      53     204.139249  15.703019    13      53
2  three  black     822  48.352941    17       6     123.269405   7.251141    17      31
3  three  white    1614  57.642857    28      50     285.638337  10.201369    28      37
4    two  black     523  58.111111     9      85      80.908912   8.989879     9      88
5    two  white     669  41.812500    16      78      82.098870   5.131179    16      78
[6 rows x 10 columns]

How can I flatten/merge the column index levels as: "Level1|Level2", e.g. size|sum, scaled_size|sum. etc? If this is not possible, is there a way to groupby() as I did above without creating multi-index columns?

This is just my opinion ,I feel like scott's is better than the accepted one. — BENY
– BENY, Commented Jan 31, 2019 at 22:38

CDspace · Accepted Answer · 2024-03-07 22:37:32Z

188

There are varied (i.e., more pythonic) way to flatten a MultiIndex columns into single-level columns.

Use map and join with string column headers:

grouped.columns = grouped.columns.map('|'.join).str.strip('|')

print(grouped)

Output:

   code  colour   size|sum  size|average  size|size  size|idxmax  \
0    one   black       862     53.875000         16           14   
1    one   white       554     46.166667         12           18   
2  three   black       842     49.529412         17           90   
3  three   white       740     56.923077         13           97   
4    two   black      1541     61.640000         25           50   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0             6980           436.250000                16                  77  
1             6101           508.416667                12                  13  
2             7889           464.058824                17                  64  
3             6329           486.846154                13                  73  
4            12809           512.360000                25                  23

Use map with format for column headers that have numeric data types.

grouped.columns = grouped.columns.map('{0[0]}|{0[1]}'.format)

Output:

   code| colour|  size|sum  size|average  size|size  size|idxmax  \
0    one   black       734     52.428571         14           30   
1    one   white      1110     65.294118         17           88   
2  three   black       930     51.666667         18            3   
3  three   white      1140     51.818182         22           20   
4    two   black       656     38.588235         17           77   
5    two   white       704     58.666667         12           17   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0             8229           587.785714                14                  57  
1             8781           516.529412                17                  73  
2            10743           596.833333                18                  21  
3            10240           465.454545                22                  26  
4             9982           587.176471                17                  16  
5             6537           544.750000                12                  49

Use list comprehension with f-string for Python 3.6+:

grouped.columns = [f'{i}|{j}' if j != '' else f'{i}' for i,j in grouped.columns]

Output:

    code colour  size|sum  size|average  size|size  size|idxmax  \
0    one  black      1003     43.608696         23           76   
1    one  white      1255     59.761905         21           66   
2  three  black       777     45.705882         17           39   
3  three  white       630     52.500000         12           23   
4    two  black       823     54.866667         15           33   
5    two  white       491     40.916667         12           64   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0            12532           544.869565                23                  27  
1            13223           629.666667                21                  13  
2             8615           506.764706                17                  92  
3             6101           508.416667                12                  43  
4             7661           510.733333                15                  42  
5             6143           511.916667                12                  49

edited Mar 7, 2024 at 22:37

CDspace

2,69919 gold badges32 silver badges39 bronze badges

answered May 9, 2017 at 0:17

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Pablo Over a year ago

it doesn't work when you have numeric columns

MultiIndex(levels=[[u'col_a', u'col_b', u'col_c'], [7950230.0, 12304568.0]],            labels=[[0, 0, 1], [0, 1, 1]],            names=[lev, sublev'])

it returns TypeError: sequence item 1: expected string, float found

Scott Boston Over a year ago

@PabloA grouped.columns.map('{0[0]} | {0[1]}'.format)

Paul H Over a year ago

brief update, as of at least v0.23, there's a set_axis method you can use: renamed = df.set_axis(['|'.join(c) for c in df.columns], axis='columns', inplace=False)

Paul H Over a year ago

(and axis='index' would achieve similar results along multi-indexed rows)

BENY Over a year ago

Hi Man Adjust this one grouped.columns.map('|'.join).str.strip('|') , BTW I think this should be the accepted answer

|

acushner · Accepted Answer · 2014-06-18 16:59:34Z

42

you could always change the columns:

grouped.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in grouped.columns]

answered Jun 18, 2014 at 16:59

acushner

9,9461 gold badge38 silver badges37 bronze badges

3 Comments

toto_tico Over a year ago

if one of the columns in level 1 is equal to 0, then the above expression will ignore it here :b if b else ''. Instead, I used b != '', so grouped.columns = ['%s%s' % (a, '|%s' % b if b != '' else '') for a, b in grouped.columns]. This might be useful after using groupby which enumerates columns with numbers starting from 0.

acushner Over a year ago

there would be a problem with Nones in that, so you'd have to do if (b == 0 or b), but still a good call

toto_tico Over a year ago

@acusher, you right, though if b is not None should the simple way of expressing it...

Ningrong Ye · Accepted Answer · 2018-12-04 02:22:11Z

18

Based on Scott Boston's answer, little update(it will be work for 2 or more levels column):

temp.columns.map(lambda x: '|'.join([str(i) for i in x]))

Thank you, Boston!

answered Dec 4, 2018 at 2:22

Ningrong Ye

1,28712 silver badges10 bronze badges

Comments

BSalita · Accepted Answer · 2022-12-11 15:52:43Z

11

Full credit to suraj's concise answer: https://stackoverflow.com/a/72616083/317797

df.columns = df.columns.map('_'.join)

answered Dec 11, 2022 at 15:52

BSalita

9,07111 gold badges59 silver badges75 bronze badges

1 Comment

dlm Over a year ago

Yep, this is the best way!

Ynjxsjmh · Accepted Answer · 2023-04-16 13:20:05Z

If you want to chain the operation, you can do

out = (grouped.set_axis(grouped.columns.values, axis=1)
       # If you want to preserve order and strip the leading |
       .rename(columns=lambda col: '|'.join(col).strip('|'))
       # or if you don't care the extra |
       #.rename(columns='|'.join)
       # If you want to change the order and strip the leading |
       #.rename(columns=lambda col: f'{col[1]}|{col[0]}'.strip('|'))
       # or the order matters and you don't care the extra |
       #.rename(columns='{0[1]}|{0[0]}'.format)
       )

print(out)

    code colour  size|sum  size|average  size|size  size|idxmax  scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax
0    one  black       620     41.333333         15           24             7727           515.133333                15                  48
1    one  white       678     45.200000         15           37             8290           552.666667                15                  17
2  three  black       957     43.500000         22           34            11899           540.863636                22                   0
3  three  white       918     54.000000         17           12             8017           471.588235                17                  63
4    two  black      1009     63.062500         16           73             8954           559.625000                16                  35
5    two  white       601     40.066667         15           90             8729           581.933333                15                  96

Gurubux · Accepted Answer · 2023-12-01 12:35:18Z

0

Inline with @scott-boston 's answer, in order to skip "Unnamed" columns to be merged use the following code

['|'.join(column)  if 'Unnamed' not in column[0] else column[1] for column in df.columns]

answered Dec 1, 2023 at 12:35

Gurubux

1213 silver badges6 bronze badges

Collectives™ on Stack Overflow

Pandas dataframe with multiindex column - merge levels

6 Answers 6

8 Comments

3 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

8 Comments

3 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related