1

I have a multi-indexed dataframe,

>>> df
       a1      a2    
       b1  b2  b1  b2
c1 d1  11  21  31  41
   d2  12  22  32  42
c2 d1  13  23  33  43
   d2  14  24  34  44

It has 2 levels of header and 2 levels of index. If I directly use the code df.to_csv('test_file.csv'), then the format of the file test_file.csv is

,,a1,a1,a2,a2
,,b1,b2,b1,b2
c1,d1,11,21,31,41
c1,d2,12,22,32,42
c2,d1,13,23,33,43
c2,d2,14,24,34,44

However, I would like to change it to

  1. remove the duplicates in the 1st level of header
  2. remove entire 1st level of index, and make an empty row for each one in the 1st level of index.

The wanted format is:

,a1,,a2,
,b1,b2,b1,b2
c1,,,,,
d1,11,21,31,41
d2,12,22,32,42
c2,,,,,
d1,13,23,33,43
d2,14,24,34,44

Could you please show me how to do it? Thanks! Please use the code below.

import pandas as pd


df = pd.DataFrame(
    {
        ('a1', 'b1'): [11, 12, 13, 14],
        ('a1', 'b2'): [21, 22, 23, 24],
        ('a2', 'b1'): [31, 32, 33, 34],
        ('a2', 'b2'): [41, 42, 43, 44],
    },
    index=pd.MultiIndex.from_tuples([
        ('c1', 'd1'),
        ('c1', 'd2'),
        ('c2', 'd1'),
        ('c2', 'd2'),
    ]),
)

print(df)
df.to_csv('my_test_file.csv')

1 Answer 1

1

Here is a working solution. It uses a helper function to remove the duplicated consecutive labels and groupy+apply+pandas.concat to shift the multiindex level as extra empty row:

def remove_consecutive(l):
    '''replaces consecutive items in "l" with empty string'''
    from itertools import groupby, chain
    return tuple(chain(*([k]+['']*(len(list(g))-1) for k,g in groupby(l))))

(df.groupby(level=0)
   # below to shift level=0 as new row
   .apply(lambda g: pd.concat([pd.DataFrame([],
                                            index=[g.name],
                                            columns=g.columns),
                               g.droplevel(0)]))
  .droplevel(0)
  # below to remove the duplicate column names
  .T # unfortunately there is no set_index equivalent for columns, so transpose before/after
  .set_index(pd.MultiIndex.from_arrays(list(map(remove_consecutive, zip(*df.columns)))))
  .T
  # export
  .to_csv('filename.csv')
)

output:

,a1,,a2,
,b1,b2,b1,b2
c1,,,,
d1,11,21,31,41
d2,12,22,32,42
c1,,,,
d1,13,23,33,43
d2,14,24,34,44
Sign up to request clarification or add additional context in comments.

1 Comment

thanks a lot for your useful code. It works! The only mistake is index=[name], it should be index=[g.name]. And I also learned from you to use () for separating method calls at different lines. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.