Writing data into a CSV within a loop Python

Question

I am currently working on the Dstl satellite kaggle challenge. There I need to create a submission file that is in csv format. Each row in the csv contains:

Image ID, polygon class (1-10), Polygons

Polygons are a very long entry with starts and ends and starts etc.

The polygons are created with an algorithm, for one class at a time, for one picture at a time (429 pictures, 10 classes each).

Now my question is related to computation time and best practice: How do I best write the data of the polygons that I create into the csv? Do I open the csv at the beginning and then write each row into the file, as I iterate over classes and images?

Or should I rather save the data in a list or dictionary or something and then write the whole thing into the csv file at once?

The thing is, I am not sure how fast the writing into a csv file is. Also, as the algorithm is already rather consuming computationally, I would like to save my pc the trouble of keeping all the data in the RAM.

And I guess writing the data into the csv right away would result in less RAM used, right?

So you say that disc operations are slow. What exactly does that mean? When I write into the csv each row live as I create the data, does that slow down my program? So if I write a whole list into a csv file that would be faster than writing a row, then again calculating a new data row? So that would mean, that the computer waits for an action to finish before the next action gets started, right? But then still, what makes the process faster if I wait for the whole data to accumulate? Anyway the same number of rows have to be written into the csv, why would it be slower if I do it line by line?

Frank V · Accepted Answer · 2017-02-16 01:13:52Z

3

How do I best write the data of the polygons that I create into the csv? Do I open the csv at the beginning and then write each row into the file, as I iterate over classes and images?

I suspect most folks would gather the data in a list or perhaps dictionary and then write it all out at the end. But if you don't need to do additional processing to it, yeah -- send it to disk and release the resources.

And I guess writing the data into the csv right away would result in less RAM used, right?

Yes, it would but it's not going to impact CPU usage; just reduce RAM usage though it does depend on when Python GC them. You really shouldn't worry about details like this. Get accurate output, first and foremost.

edited Feb 16, 2017 at 1:13

answered Feb 15, 2017 at 21:49

Frank V

25.5k36 gold badges109 silver badges145 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jon Kiparsky · Accepted Answer · 2017-02-16 01:27:49Z

First of all, use the csv library. Documentation https://docs.python.org/2/library/csv.html (py2) or https://docs.python.org/3/library/csv.html (py3)

Now, using this library you can take a list of list-like objects or a list of dicts (where the keys are the headers of your csv) and write them to a file. This is almost certainly the right way to go. If you've got enough data that you've maxed out your memory for the python process, then you might want to go back and think about it some more but with 429 * 10 = 4290 rows, that's probably not happening.

And I guess writing the data into the csv right away would result in less RAM used, right?

Disk access is in general a relatively slow operation, so anything that maximizes disk access to save memory usage is a questionable approach.

Collectives™ on Stack Overflow

Writing data into a CSV within a loop Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related