Blank line below headers created when using MultiIndex and to_excel in Python

Question

I am trying to save a Pandas dataframe to an excel file using the to_excel function with XlsxWriter.

When I print the dataframe to the terminal then it reads as it should, but when I save it to excel and open the file, there is an extra blank line below the headers which shouldn't be there. This only happens when using MultiIndex for the headers, but I need the layered headers that it offers and I can't find a solution.

Below is code from an online MultiIndex example which produces the same result as the project I'm working on. Any solutions would be greatly appreciated.

import numpy as np
import pandas as pd
import xlsxwriter

tuples = [('bar', 'one'), ('bar', 'two'), ('baz', 'one'), ('baz', 'two'), ('foo', 'one'), ('foo', 'two'), ('qux', 'one'), ('qux', 'two')]

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]

pd.MultiIndex.from_product(iterables, names=['first', 'second'])

df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)

print(df)

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='test1')

The excel output created:

lrnzcig · Accepted Answer · 2019-05-08 15:39:58Z

18

Really appreciate both the question and @Teoretic's workaround.

However in my case the merged cells for the Multiindex columns are very useful, and those are lost with @Teoretic's. I've done an alternative workaround hiding the entire row before writing, it works, thus I include it here in case is useful for anybody.

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='test1')
writer.sheets['test1'].set_row(2, None, None, {'hidden': True})
writer.save()

answered May 8, 2019 at 15:39

lrnzcig

3,9474 gold badges40 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dranikf · Accepted Answer · 2022-03-01 07:27:18Z

9

I have one more solution. It based on idea to save in two calls of the to_excel function - the first save only header, the second save table without header. To realise I prepared such function:

def save_double_column_df(df, xl_writer, startrow = 0, **kwargs):
    '''Function to save doublecolumn DataFrame, to xlwriter'''
    # inputs:
    # df - pandas dataframe to save
    # xl_writer - book for saving
    # startrow - row from wich data frame will begins
    # **kwargs - arguments of `to_excel` function of DataFrame`
    df.drop(df.index).to_excel(xl_writer, startrow = startrow, **kwargs)
    df.to_excel(xl_writer, startrow = startrow + 1, header = False, **kwargs)

The example of using:

y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
col_list = pd.MultiIndex.from_tuples(y)
A = pd.DataFrame(np.random.randint(2,5,(4,6)), columns = col_list)

xl_writer = pd.ExcelWriter("test_result/multiindex_saving.xlsx",engine='xlsxwriter')
save_double_column_df(A, xl_writer, sheet_name = 'hello')
xl_writer.close()

The result looks like:

answered Mar 1, 2022 at 7:27

Dranikf

1963 silver badges8 bronze badges

1 Comment

jmcnamara Over a year ago

That's a nice solution.

Teoretic · Accepted Answer · 2018-09-25 13:11:32Z

5

This is most likely a bug in pandas.

See this question for suggested solution:

No easy way out of this but to delete that row by reading the xlsx in again.

Also there is a link into the GitHub issue, addressing this topic.

So I made this workaround, it might be helpful for you:

df = pd.read_excel('/home/teoretic/test.xlsx', index_col=0)
df = df.drop(np.nan)  # <== dropping an empty row

rename_dct = dict.fromkeys(df.loc[:,df.columns.str.contains('^Unnamed')], '')
df = df.rename(columns=rename_dct)  # <== renaming 'Unnamed' columns to blank space 

writer = pd.ExcelWriter('/home/teoretic/test_new.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='test1')

writer.close()

This is an output file:

edited Sep 25, 2018 at 13:11

answered Sep 25, 2018 at 13:03

Teoretic

2,5631 gold badge22 silver badges28 bronze badges

7 Comments

James Salmon Over a year ago

Thank you very much for the reply, I was hoping there would be a way to keep the top row of cells merged but not to worry.

Teoretic Over a year ago

I tried to keep multiindex, but then I would end up with the same "blank line" problem in the end when saving it to xlsx. Found no way out but this not perfect workaround :)

James Salmon Over a year ago

I've managed to figure out a workaround where the top line of headers is written to excel seperately, then the relevant cells merged through 'workbook.merge_range() ', and then the rest of the data is added below by specifying 'startrow=1' in 'to_excel'.

D A Wells Over a year ago

I think this is not a bug. When you have named your row index/es the names are put into this row. This allows you to automatically name row indexes when reading in excel files.

wisbucky Over a year ago

newer Github issue link

|

jkhuang · Accepted Answer · 2020-09-25 22:50:10Z

I have found the following workaround using the package openpyxl by reading the file with the empty line, deleting the line, and re-writing the file. The advantage I've found with this method is that it preserves the pretty merged cells in the MultiIndex columns and actually deletes the empty row from the Excel file. By actually deleting the empty row, if the rest of your data is numeric, this will allow you to use Excel's filtering capabilities in your spreadsheet out of the box without having to manually delete the empty row in the file.

# Assuming df is your data frame with MultiIndex columns that you have already written to Excel

# Load the file with the empty line and select the sheet you want to edit
wb = openpyxl.load_workbook(filename = 'file_with_empty_line.xlsx')
ws = wb['sheet_name_to_edit']

# The row number to delete is 1-indexed in Excel
row_to_delete = df.columns.nlevels
ws.delete_rows(row_to_delete)

# If you want to write the name of the index into the empty cell above the index col
# after deleting the row (assuming you are writing to the first column):
ws['A'+str(row_to_delete)]=df.index.name

# Save the modified workbook to file
wb.save(filename = 'file_without_emtpy_line.xlsx')

While its not ideal to have to import an entire package to handle this, my usage case required me to not simply hide the missing row. @lrnzcig's solution is much better if you can get away with only having to hide the empty row.

lrnzcig · Accepted Answer · 2021-12-14 20:29:41Z

1

I used ExcelWriter to remove that blank line. I was adding the df to an existing sheet.

with pd.ExcelWriter(“PATH_TO_EXCEL”,mode=“a”,engine=“openpyxl”) as writer:
    writer.book=load_workbook(“PATH_TO_EXCEL”)
    df.to_excel(writer,sheet_name=“sample”,startrow=0,startcol=0)
    writer.sheets[‘sample’].delete_rows(3)

edited Dec 14, 2021 at 20:29

lrnzcig

3,9474 gold badges40 silver badges53 bronze badges

answered Oct 22, 2021 at 9:37

Soham Ghosh

1682 silver badges3 bronze badges

Comments

Edward Ji · Accepted Answer · 2022-06-07 22:37:39Z

1

The problem can be resolved by commenting line 642 of pandas/io/formats/excel.py.

if self._has_aliases or self.header:
    self.rowcounter += 1

This would cancel the empty lines.

edited Jun 7, 2022 at 22:37

Edward Ji

80011 silver badges20 bronze badges

answered Jun 7, 2022 at 3:13

liuweinew

111 bronze badge

Collectives™ on Stack Overflow

Blank line below headers created when using MultiIndex and to_excel in Python

6 Answers 6

Comments

1 Comment

7 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

7 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related