Write pandas dataframe to Excel with xlsxwriter and include `write_rich_string` formatting

Question

The following is reproducible and generates the desired output.

import xlsxwriter, pandas as pd

workbook = xlsxwriter.Workbook('pandas_with_rich_strings.xlsx')
worksheet = workbook.add_worksheet()

# Set up some formats to use.
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})

df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
}) 


headRows = 1


for colNum in range(len(df.columns)):
    xlColCont = df[df.columns[colNum]].tolist()
    worksheet.write_string(0, colNum , str(df.columns[colNum]), bold)
    for rowNum in range(len(xlColCont)):
        if df.columns[colNum] == 'numCol': 
            worksheet.write_number(rowNum+headRows, colNum , xlColCont[rowNum])        
        elif df.columns[colNum] == 'richText':
            worksheet.write_rich_string(rowNum+headRows, colNum , *xlColCont[rowNum])
        else:
            worksheet.write_string(rowNum+headRows, colNum , str(xlColCont[rowNum]))


workbook.close()

However, how would I do the same thing without iterating over each column and write the entire pandas dataframe to the Excel file in one go and include the write_rich_string formatting?

The following does not work.

writer = pd.ExcelWriter('pandas_with_rich_strings.xlsx', engine='xlsxwriter')
workbook = xlsxwriter.Workbook('pandas_with_rich_strings.xlsx')
worksheet = workbook.add_worksheet('pandas_df')
df.to_excel(writer,'pandas_df')
writer.save()

patrickjlong1 · Accepted Answer · 2018-06-14 22:01:17Z

11

I'm not sure that my answer is much better than the way you do it, but I've cut it down to use only one for loop and make use of pandas.DataFrame.to_excel() to initially put the dataframe in excel. Please note that I then overwrite the last column using worksheet.write_rich_string().

import pandas as pd

writer = pd.ExcelWriter('pandas_with_rich_strings.xlsx', engine='xlsxwriter')
workbook  = writer.book
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})
df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
}) 
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
# you then need to overwite the richtext column with
for idx, x in df['richText'].iteritems():
    worksheet.write_rich_string(idx + 1, 2, *x)
writer.save()

With Expected Outputted .xlsx:

answered Jun 14, 2018 at 22:01

patrickjlong1

3,8231 gold badge20 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

nickanor Over a year ago

hi, i'm curious what is the meaning of '*' in worksheet.write_rich_string(idx + 1, 2, *x)?

jewles · Accepted Answer · 2020-09-11 19:56:01Z

Your code could be simplified with the use of worksheet.add_write_handler() to detect the list and call worksheet.write_rich_string() automatically from worksheet.write without having to manually check the type. You'd think

worksheet.add_write_handler(list, xlsxwriter.worksheet.Worksheet.write_rich_string)

should work but doesn't because of some issue with the varargs for that method getting mangled (the last arg is optional styling for the whole cell). So instead the following does work

worksheet.add_write_handler(list, lambda worksheet, row, col, args: worksheet._write_rich_string(row, col, *args))

Unfortunately, this approach isn't easily compatible with pd.to_excel because it has to be set on the worksheet before the data is written and because ExcelWriter serializes lists and dicts to strings before writing (there is a note in the documentation that says this is to be compatible with CSV writers). Subclassing pd.io.excel._xlsxwriter._XlsxWriter can work:

import xlsxwriter, pandas as pd

class RichExcelWriter(pd.io.excel._xlsxwriter._XlsxWriter):
    def __init__(self, *args, **kwargs):
        super(RichExcelWriter, self).__init__(*args, **kwargs)

    def _value_with_fmt(self, val):
        if type(val) == list:
            return val, None
        return super(RichExcelWriter, self)._value_with_fmt(val)

    def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, freeze_panes=None):
        sheet_name = self._get_sheet_name(sheet_name)
        if sheet_name in self.sheets:
            wks = self.sheets[sheet_name]
        else:
            wks = self.book.add_worksheet(sheet_name)
            #add handler to the worksheet when it's created
            wks.add_write_handler(list, lambda worksheet, row, col, list, style: worksheet._write_rich_string(row, col, *list))
            self.sheets[sheet_name] = wks
        super(RichExcelWriter, self).write_cells(cells, sheet_name, startrow, startcol, freeze_panes)



writer = RichExcelWriter('pandas_with_rich_strings_class.xlsx')
workbook  = writer.book
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})
df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
})

df.to_excel(writer, sheet_name='Sheet1', index=False)

writer.save()

Alternately we can keep the use of xlsxwriter directly and the use the ExcelFormatter from pandas which also handles the header formatting and takes many of the same arguments that to_excel takes.

import xlsxwriter, pandas as pd
from pandas.io.formats.excel import ExcelFormatter

workbook = xlsxwriter.Workbook('pandas_with_rich_strings.xlsx')
worksheet = workbook.add_worksheet()

# Set up some formats to use.
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})

df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
})

worksheet.add_write_handler(list, lambda worksheet, row, col, args: worksheet.write_rich_string(row, col, *args))
cells = ExcelFormatter(df, index=False).get_formatted_cells()

for cell in cells:
    worksheet.write(cell.row, cell.col ,cell.val)

workbook.close()

This produces the desired output and no need to loop over the data twice. In fact it uses the same generator that pandas would have used so it's just as efficient as pandas.to_excel() and the formatter class takes many of the same arguments.

Note that as of pandas 1.5 the made write_cells() of _XlsxWriter private -> _write_cells() (see release notes, PR). After changing this also in your custom class RichExcelWriter it works like charm!

Collectives™ on Stack Overflow

Write pandas dataframe to Excel with xlsxwriter and include `write_rich_string` formatting

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related