4

I am running some numerical simulations with python, pandas and scipy. I run a set of scenarios, and for each scenario I create a detailed dataframe with lots of outputs, which I save to separate CSV files. Each CSV file is about 900 KB.

The line I use is, banally:

mydataframe.to_csv('myoutput.csv')

My question is: is there a way to speed the exporting process? Some specific parameters, a different library, etc. I ask because writing to CSV takes almost half the time of the total simulation: running 18 scenarios takes 17 seconds, 7.2 of which spent in the to_csv method.

PS I had initially wanted to write to Excel, but that's too slow, as per my other question: Python: fastest way to write pandas DataFrame to Excel on multiple sheets

4
  • Have you profiled this? can you compare the performance using np.savetxt Commented Jul 9, 2015 at 10:56
  • each dataframe has from 300 to 400 columns. How can I get np.savetxt to write column headings? I understand it has a header argument, but it doesn't seem to accept a list of column names. Commented Jul 9, 2015 at 11:25
  • As you can read in the docs (docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html), it is a string that will be written at the beginning of the file. So you can do ','.join(mydataframe.columns) Commented Jul 9, 2015 at 13:38
  • I can't get np.savetxt to work with non-numerical arrays, which is a problem because my dataframe has many text fields. Commented Aug 18, 2015 at 15:08

1 Answer 1

0

Try compressing the file:

mydataframe.to_csv('myoutput.gz', compression='gzip')

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.