CSV reader object not reading entire file [Python]

Question

I am currently working on a project that uses the csv module in python. I have created a separate class to open a pre-existing csv file, modify the data on each line, then save the data to a new csv file.

The original file has 1438 rows, and by placing some test code into the class that handles the writing, it indicates that it is writing 1438 rows to the new csv file. Upon inspection of the file itself, there is infact 1438 rows in the newly created file. However, when I use the standard cvs module in this way:

reader = csv.reader(open('naiveData.csv', 'rb'))

It only goes to row 1410 (and not even then entire row, it ends one and a half indices before the end of the row. I am not sure what may be causing this.

This is how I am accessing the reader:

 for row in reader:                                                          
    print row

Here is the part of the output where it fails:

['UNPM', '16', '2.125', '910', 'athlete', 'enrolled'] 
['UNPM', '14', '2.357', '1020', 'non-athlete', 'enrolled']    
['UNDC', '17', '2.071', '910', 'athlete', 'unenrolled']  
['KINS', '15', '2.6', '910', 'athlete', 'enrolled']  
['PHYS', '16', '1.5', '900', 'non-']

The last list should have ['PHYS', '16', '1.5', '900', 'non-athlete', 'enrolled'].

Any ideas as to what may be causing this? Thanks in advance!

Edit:

Here are the lines in the CVS file around the area the error is occuring:

KINS,15,2.6,910,athlete,enrolled
PHYS,16,1.5,900,non-athlete,enrolled
UNPL,15,3,960,non-athlete,enrolled

Can you post the full line from the input file where the output breaks? — Pedro Werneck
– Pedro Werneck, Commented Nov 12, 2013 at 23:42
@PedroWerneck sure thing, I added it at the bottom of the question — Royal
– Royal, Commented Nov 12, 2013 at 23:48
So you're doing read_csv(x) -> process -> write_csv(y), then when you read_csv(y) again to read the rows, some are missing? — Austin Phillips
– Austin Phillips, Commented Nov 12, 2013 at 23:57
It looks like the file wasn't completely flushed to disk when you read. Are you using the with statement? Did you close it properly after writing to it? — Pedro Werneck
– Pedro Werneck, Commented Nov 12, 2013 at 23:58
When you write the file, do you explicitly call .close() or are you using a with statement to make sure the file is properly closed? I'm wondering if the file is not being fully written somehow before your writing program terminates. If you are using CPython this doesn't seem likely, but if you are using Jython or PyPy it seems possible. — steveha
– steveha, Commented Nov 13, 2013 at 0:00

abarnert · Accepted Answer · 2013-11-12 23:59:50Z

6

I'm willing to bet this is the problem, although it's hard to be sure when you've only shown us 3 lines of code instead of a reproducible example.

You're doing something like this:

old_reader = csv.reader(open('old.csv', 'rb'))
writer = csv.writer(open('new.csv', 'wb'))
for row in old_reader:
    writer.writerow(transform(row))
new_reader = csv.reader(open('new.csv', 'rb'))
for row in new_reader:
    print row

At the time you open new.csv for reading, you haven't yet closed new.csv for writing. So the last buffer hasn't been flushed to disk. So you can't see it.

But then, when your script finishes, the writer goes out of scope, the file object no longer has any references, so it gets flushed and closed. So when you inspect it from outside of the program, after the script finishes, now it's complete. (Note that this behavior is explicitly not guaranteed; you're just getting lucky.)

And this is why you should never leak files by just putting an open in the middle of an expression. Use a with statement instead. For example:

with open('old.csv', 'rb') as oldf, open('new.csv', 'wb') as newf:
    old_reader = csv.reader(oldf)
    writer = csv.writer(newt)
    for row in old_reader:
        writer.writerow(transform(row))
with open('new.csv', 'rb') as newf:
    new_reader = csv.reader(newf)
    for row in new_reader:
        print row

answered Nov 12, 2013 at 23:59

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Royal Over a year ago

Thank you so much for this explanation! It worked perfectly. I will keep the concept of using a with statement in mind next time.

flokk Over a year ago

This happened to me also through subprocess.Popen, having a process generating the csv file. Replacing the Popen() call by run() solved this (python 3.5.2)

Galuoises · Accepted Answer · 2019-08-01 09:50:29Z

0

I had a similar issue, but eventually the problem was that a comma was missing in a row of the csv file

answered Aug 1, 2019 at 9:50

Galuoises

3,35333 silver badges47 bronze badges

Collectives™ on Stack Overflow

CSV reader object not reading entire file [Python]

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related