95

I have a dataframe, grouped, with multiindex columns as below:

import pandas as pd
import numpy as np
import random

codes = ["one","two","three"];
colours = ["black", "white"];
textures = ["soft", "hard"];
N= 100 # length of the dataframe
df = pd.DataFrame({ 'id' : range(1,N+1),
                    'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)],
                    'code' : [random.choice(codes) for i in range(1,N+1)],
                    'colour': [random.choice(colours) for i in range(1,N+1)],
                    'texture': [random.choice(textures) for i in range(1,N+1)],
                    'size': [random.randint(1,100) for i in range(1,N+1)],
                    'scaled_size': [random.randint(100,1000) for i in range(1,N+1)]
                   },  columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size'])
grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index()

>> grouped
    code colour     size                           scaled_size                         
                    sum    average  size  idxmax            sum    average  size  idxmax
0    one  black    1031  60.647059    17      81     185.153944  10.891408    17      47
1    one  white     481  37.000000    13      53     204.139249  15.703019    13      53
2  three  black     822  48.352941    17       6     123.269405   7.251141    17      31
3  three  white    1614  57.642857    28      50     285.638337  10.201369    28      37
4    two  black     523  58.111111     9      85      80.908912   8.989879     9      88
5    two  white     669  41.812500    16      78      82.098870   5.131179    16      78
[6 rows x 10 columns]

How can I flatten/merge the column index levels as: "Level1|Level2", e.g. size|sum, scaled_size|sum. etc? If this is not possible, is there a way to groupby() as I did above without creating multi-index columns?

1
  • 6
    This is just my opinion ,I feel like scott's is better than the accepted one. Commented Jan 31, 2019 at 22:38

6 Answers 6

188

There are varied (i.e., more pythonic) way to flatten a MultiIndex columns into single-level columns.

  1. Use map and join with string column headers:
grouped.columns = grouped.columns.map('|'.join).str.strip('|')

print(grouped)

Output:

   code  colour   size|sum  size|average  size|size  size|idxmax  \
0    one   black       862     53.875000         16           14   
1    one   white       554     46.166667         12           18   
2  three   black       842     49.529412         17           90   
3  three   white       740     56.923077         13           97   
4    two   black      1541     61.640000         25           50   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0             6980           436.250000                16                  77  
1             6101           508.416667                12                  13  
2             7889           464.058824                17                  64  
3             6329           486.846154                13                  73  
4            12809           512.360000                25                  23  
  1. Use map with format for column headers that have numeric data types.
grouped.columns = grouped.columns.map('{0[0]}|{0[1]}'.format) 

Output:

   code| colour|  size|sum  size|average  size|size  size|idxmax  \
0    one   black       734     52.428571         14           30   
1    one   white      1110     65.294118         17           88   
2  three   black       930     51.666667         18            3   
3  three   white      1140     51.818182         22           20   
4    two   black       656     38.588235         17           77   
5    two   white       704     58.666667         12           17   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0             8229           587.785714                14                  57  
1             8781           516.529412                17                  73  
2            10743           596.833333                18                  21  
3            10240           465.454545                22                  26  
4             9982           587.176471                17                  16  
5             6537           544.750000                12                  49 
  1. Use list comprehension with f-string for Python 3.6+:
grouped.columns = [f'{i}|{j}' if j != '' else f'{i}' for i,j in grouped.columns]

Output:

    code colour  size|sum  size|average  size|size  size|idxmax  \
0    one  black      1003     43.608696         23           76   
1    one  white      1255     59.761905         21           66   
2  three  black       777     45.705882         17           39   
3  three  white       630     52.500000         12           23   
4    two  black       823     54.866667         15           33   
5    two  white       491     40.916667         12           64   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0            12532           544.869565                23                  27  
1            13223           629.666667                21                  13  
2             8615           506.764706                17                  92  
3             6101           508.416667                12                  43  
4             7661           510.733333                15                  42  
5             6143           511.916667                12                  49  
Sign up to request clarification or add additional context in comments.

8 Comments

it doesn't work when you have numeric columns MultiIndex(levels=[[u'col_a', u'col_b', u'col_c'], [7950230.0, 12304568.0]], labels=[[0, 0, 1], [0, 1, 1]], names=[lev, sublev']) it returns TypeError: sequence item 1: expected string, float found
@PabloA grouped.columns.map('{0[0]} | {0[1]}'.format)
brief update, as of at least v0.23, there's a set_axis method you can use: renamed = df.set_axis(['|'.join(c) for c in df.columns], axis='columns', inplace=False)
(and axis='index' would achieve similar results along multi-indexed rows)
Hi Man Adjust this one grouped.columns.map('|'.join).str.strip('|') , BTW I think this should be the accepted answer
|
42

you could always change the columns:

grouped.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in grouped.columns]

3 Comments

if one of the columns in level 1 is equal to 0, then the above expression will ignore it here :b if b else ''. Instead, I used b != '', so grouped.columns = ['%s%s' % (a, '|%s' % b if b != '' else '') for a, b in grouped.columns]. This might be useful after using groupby which enumerates columns with numbers starting from 0.
there would be a problem with Nones in that, so you'd have to do if (b == 0 or b), but still a good call
@acusher, you right, though if b is not None should the simple way of expressing it...
18

Based on Scott Boston's answer, little update(it will be work for 2 or more levels column):

temp.columns.map(lambda x: '|'.join([str(i) for i in x]))

Thank you, Boston!

Comments

11

Full credit to suraj's concise answer: https://stackoverflow.com/a/72616083/317797

df.columns = df.columns.map('_'.join)

1 Comment

Yep, this is the best way!
0

If you want to chain the operation, you can do

out = (grouped.set_axis(grouped.columns.values, axis=1)
       # If you want to preserve order and strip the leading |
       .rename(columns=lambda col: '|'.join(col).strip('|'))
       # or if you don't care the extra |
       #.rename(columns='|'.join)
       # If you want to change the order and strip the leading |
       #.rename(columns=lambda col: f'{col[1]}|{col[0]}'.strip('|'))
       # or the order matters and you don't care the extra |
       #.rename(columns='{0[1]}|{0[0]}'.format)
       )
print(out)

    code colour  size|sum  size|average  size|size  size|idxmax  scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax
0    one  black       620     41.333333         15           24             7727           515.133333                15                  48
1    one  white       678     45.200000         15           37             8290           552.666667                15                  17
2  three  black       957     43.500000         22           34            11899           540.863636                22                   0
3  three  white       918     54.000000         17           12             8017           471.588235                17                  63
4    two  black      1009     63.062500         16           73             8954           559.625000                16                  35
5    two  white       601     40.066667         15           90             8729           581.933333                15                  96

Comments

0

Inline with @scott-boston 's answer, in order to skip "Unnamed" columns to be merged use the following code

['|'.join(column)  if 'Unnamed' not in column[0] else column[1] for column in df.columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.