1

I am currently working on a project that uses the csv module in python. I have created a separate class to open a pre-existing csv file, modify the data on each line, then save the data to a new csv file.

The original file has 1438 rows, and by placing some test code into the class that handles the writing, it indicates that it is writing 1438 rows to the new csv file. Upon inspection of the file itself, there is infact 1438 rows in the newly created file. However, when I use the standard cvs module in this way:

reader = csv.reader(open('naiveData.csv', 'rb'))

It only goes to row 1410 (and not even then entire row, it ends one and a half indices before the end of the row. I am not sure what may be causing this.

This is how I am accessing the reader:

 for row in reader:                                                          
    print row 

Here is the part of the output where it fails:

['UNPM', '16', '2.125', '910', 'athlete', 'enrolled'] 
['UNPM', '14', '2.357', '1020', 'non-athlete', 'enrolled']    
['UNDC', '17', '2.071', '910', 'athlete', 'unenrolled']  
['KINS', '15', '2.6', '910', 'athlete', 'enrolled']  
['PHYS', '16', '1.5', '900', 'non-']

The last list should have ['PHYS', '16', '1.5', '900', 'non-athlete', 'enrolled'].

Any ideas as to what may be causing this? Thanks in advance!

Edit:

Here are the lines in the CVS file around the area the error is occuring:

KINS,15,2.6,910,athlete,enrolled
PHYS,16,1.5,900,non-athlete,enrolled
UNPL,15,3,960,non-athlete,enrolled
12
  • Can you post the full line from the input file where the output breaks? Commented Nov 12, 2013 at 23:42
  • @PedroWerneck sure thing, I added it at the bottom of the question Commented Nov 12, 2013 at 23:48
  • So you're doing read_csv(x) -> process -> write_csv(y), then when you read_csv(y) again to read the rows, some are missing? Commented Nov 12, 2013 at 23:57
  • It looks like the file wasn't completely flushed to disk when you read. Are you using the with statement? Did you close it properly after writing to it? Commented Nov 12, 2013 at 23:58
  • When you write the file, do you explicitly call .close() or are you using a with statement to make sure the file is properly closed? I'm wondering if the file is not being fully written somehow before your writing program terminates. If you are using CPython this doesn't seem likely, but if you are using Jython or PyPy it seems possible. Commented Nov 13, 2013 at 0:00

2 Answers 2

6

I'm willing to bet this is the problem, although it's hard to be sure when you've only shown us 3 lines of code instead of a reproducible example.

You're doing something like this:

old_reader = csv.reader(open('old.csv', 'rb'))
writer = csv.writer(open('new.csv', 'wb'))
for row in old_reader:
    writer.writerow(transform(row))
new_reader = csv.reader(open('new.csv', 'rb'))
for row in new_reader:
    print row

At the time you open new.csv for reading, you haven't yet closed new.csv for writing. So the last buffer hasn't been flushed to disk. So you can't see it.

But then, when your script finishes, the writer goes out of scope, the file object no longer has any references, so it gets flushed and closed. So when you inspect it from outside of the program, after the script finishes, now it's complete. (Note that this behavior is explicitly not guaranteed; you're just getting lucky.)

And this is why you should never leak files by just putting an open in the middle of an expression. Use a with statement instead. For example:

with open('old.csv', 'rb') as oldf, open('new.csv', 'wb') as newf:
    old_reader = csv.reader(oldf)
    writer = csv.writer(newt)
    for row in old_reader:
        writer.writerow(transform(row))
with open('new.csv', 'rb') as newf:
    new_reader = csv.reader(newf)
    for row in new_reader:
        print row
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much for this explanation! It worked perfectly. I will keep the concept of using a with statement in mind next time.
This happened to me also through subprocess.Popen, having a process generating the csv file. Replacing the Popen() call by run() solved this (python 3.5.2)
0

I had a similar issue, but eventually the problem was that a comma was missing in a row of the csv file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.