12

I need to export 24 pandas data frames ( 140 columns x 400 rows) to Excel, each into a different sheet.

I am using pandas’ built-in ExcelWriter. Running 24 scenarios, it takes:

51 seconds to write to an .xls file (using xlwt)

86 seconds to write to an .xlsx file (using XlsxWriter)

141 seconds to write to an .xlsm file (using openpyxl)

21 seconds to just run the program (no Excel output)

The problem with writing to .xls is that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the ‘comma’ button to format the numbers, it tells me: ‘style comma not found’. I don’t get this problem writing to an .xlsx, but that’s even slower.

Any suggestions on how to make the exporting faster? I can’t be the first one to have this problem, yet after hours of searching forums and websites I haven’t found any definite solution.

The only thing I can think of is to use Python to export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet.

The .xls file is 10 MB, and the .xlsx 5.2 MB

Thanks!

7
  • I think that the difference in size stems from excel and the different file formats and not from the writing method. In addition - you don't need an excel macro to merge csvs into a spreadsheet, writing a script with xlwt is fairly straight forward. Commented Sep 16, 2014 at 8:38
  • Yes, of course, the difference in size depends on the compression embedded in the xlsx format. That's not the issue. The issue is that taking 60 seconds just to write 5 MB of data seems excessive. Given that Python packages are so slow writig pandas data frames to Excel, would they be quicker writing csv to Excel? I haven't had time to test it yet but I doubt it. My hope is that merging the CSVs using Excel and VBA would have been faster. It's a bit frustrating that something as banal as exporting a few MBs of data takes so long... Commented Sep 16, 2014 at 8:43
  • So, I have run some more tests: loading the 24 CSVs and putting them onto separate sheets of an Excel file takes: Excel VBA macro: 5 seconds. Python script with xlsxWriter: 24 seconds (34 with {'constant_memory': True}, weird). Maybe the best option is to run the VBA macro from Python, as in these examples: stackoverflow.com/questions/2141967/… stackoverflow.com/questions/441758/… Commented Sep 16, 2014 at 9:43
  • I don't have the code with me now, but I've written styles/formats with xlwt. I can look it up if you're interested. Also, try here for more info on xlwt: groups.google.com/forum/#!forum/python-excel Commented Sep 16, 2014 at 12:26
  • Here is a benchmark for different Python to Excel modules. Make sure to run it on your target machine since it will be affected by Python/module versions, CPU, RAM and Disk I/O. Commented Sep 16, 2014 at 12:50

2 Answers 2

11

Here is a benchmark for different Python to Excel modules.

And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting:

Versions:
    python      : 2.7.7
    openpyxl    : 2.0.5
    pyexcelerate: 0.6.3
    xlsxwriter  : 0.5.7
    xlwt        : 0.7.5

Dimensions:
    Rows = 9600 (400 x 24)
    Cols = 140

Times:
    pyexcelerate          :  11.85
    xlwt                  :  17.64
    xlsxwriter (optimised):  21.63
    xlsxwriter            :  26.76
    openpyxl   (optimised):  95.18
    openpyxl              : 119.29

As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. So make sure to verify these results for your own setup.

Also, since you asked specifically about Pandas, please note that PyExcelerate isn't supported.

Sign up to request clarification or add additional context in comments.

2 Comments

I thought pyexcelerate was obsolete and had been subsumed into xlwt many years ago. Is this a new version and/or unrelated to old pyexcelerate?
That was PyExcelerator. PyExcelerate is a different unrelated module. It is confusing.
1

For what it's worth, this is how I format the output in xlwt. The documentation is (or at least was) pretty spotty so I had to guess most of this!

import xlwt

style = xlwt.XFStyle()
style.font.name = 'Courier'
style.font.height = 180
style.num_format_str = '#,##0'

# ws0 is a worksheet
ws0.write( row, col, value, style )

Also, I believe I duplicated your error message when attempting to format the resulting spreadsheet in excel (office 2010 version). It's weird, but some of the drop down tool bar format options work and some don't. But it looks like they all work fine if I go to "format cells" via a right click.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.