I am currently working on the Dstl satellite kaggle challenge. There I need to create a submission file that is in csv format. Each row in the csv contains:
Image ID, polygon class (1-10), Polygons
Polygons are a very long entry with starts and ends and starts etc.
The polygons are created with an algorithm, for one class at a time, for one picture at a time (429 pictures, 10 classes each).
Now my question is related to computation time and best practice: How do I best write the data of the polygons that I create into the csv? Do I open the csv at the beginning and then write each row into the file, as I iterate over classes and images?
Or should I rather save the data in a list or dictionary or something and then write the whole thing into the csv file at once?
The thing is, I am not sure how fast the writing into a csv file is. Also, as the algorithm is already rather consuming computationally, I would like to save my pc the trouble of keeping all the data in the RAM.
And I guess writing the data into the csv right away would result in less RAM used, right?
So you say that disc operations are slow. What exactly does that mean? When I write into the csv each row live as I create the data, does that slow down my program? So if I write a whole list into a csv file that would be faster than writing a row, then again calculating a new data row? So that would mean, that the computer waits for an action to finish before the next action gets started, right? But then still, what makes the process faster if I wait for the whole data to accumulate? Anyway the same number of rows have to be written into the csv, why would it be slower if I do it line by line?