How to format the csv file with df.to_csv for a multiindex dataframe, python3

Question

I have a multi-indexed dataframe,

>>> df
       a1      a2    
       b1  b2  b1  b2
c1 d1  11  21  31  41
   d2  12  22  32  42
c2 d1  13  23  33  43
   d2  14  24  34  44

It has 2 levels of header and 2 levels of index. If I directly use the code df.to_csv('test_file.csv'), then the format of the file test_file.csv is

,,a1,a1,a2,a2
,,b1,b2,b1,b2
c1,d1,11,21,31,41
c1,d2,12,22,32,42
c2,d1,13,23,33,43
c2,d2,14,24,34,44

However, I would like to change it to

remove the duplicates in the 1st level of header
remove entire 1st level of index, and make an empty row for each one in the 1st level of index.

The wanted format is:

,a1,,a2,
,b1,b2,b1,b2
c1,,,,,
d1,11,21,31,41
d2,12,22,32,42
c2,,,,,
d1,13,23,33,43
d2,14,24,34,44

Could you please show me how to do it? Thanks! Please use the code below.

import pandas as pd


df = pd.DataFrame(
    {
        ('a1', 'b1'): [11, 12, 13, 14],
        ('a1', 'b2'): [21, 22, 23, 24],
        ('a2', 'b1'): [31, 32, 33, 34],
        ('a2', 'b2'): [41, 42, 43, 44],
    },
    index=pd.MultiIndex.from_tuples([
        ('c1', 'd1'),
        ('c1', 'd2'),
        ('c2', 'd1'),
        ('c2', 'd2'),
    ]),
)

print(df)
df.to_csv('my_test_file.csv')

mozway · Accepted Answer · 2021-09-13 09:30:08Z

1

Here is a working solution. It uses a helper function to remove the duplicated consecutive labels and groupy+apply+pandas.concat to shift the multiindex level as extra empty row:

def remove_consecutive(l):
    '''replaces consecutive items in "l" with empty string'''
    from itertools import groupby, chain
    return tuple(chain(*([k]+['']*(len(list(g))-1) for k,g in groupby(l))))

(df.groupby(level=0)
   # below to shift level=0 as new row
   .apply(lambda g: pd.concat([pd.DataFrame([],
                                            index=[g.name],
                                            columns=g.columns),
                               g.droplevel(0)]))
  .droplevel(0)
  # below to remove the duplicate column names
  .T # unfortunately there is no set_index equivalent for columns, so transpose before/after
  .set_index(pd.MultiIndex.from_arrays(list(map(remove_consecutive, zip(*df.columns)))))
  .T
  # export
  .to_csv('filename.csv')
)

output:

,a1,,a2,
,b1,b2,b1,b2
c1,,,,
d1,11,21,31,41
d2,12,22,32,42
c1,,,,
d1,13,23,33,43
d2,14,24,34,44

edited Sep 13, 2021 at 9:30

answered Sep 12, 2021 at 14:39

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

aura Over a year ago

thanks a lot for your useful code. It works! The only mistake is index=[name], it should be index=[g.name]. And I also learned from you to use () for separating method calls at different lines. :)

Collectives™ on Stack Overflow

How to format the csv file with df.to_csv for a multiindex dataframe, python3

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related