Why is csvreader for Python starting good then producing NULL bytes?

Question

OKay so I am reading an excel workbook. I read the file for a while and it started off a .csv after debugging and doing other things below the code i am showing you it changed to a xlsx I started getting IOError no such file or directory. I figured out why and changed FFA.csv to FFA.xlsx and it worked error free. Then I started doing other things and debugging. Got up this morning and now i get the following Error : line contains NULL byte. weird because the code started out good. Now it can't read. I put in the print repr() to debug and it infact now prints NULL bytes. So how do i fix this and prevent it in the future? here is the 1st 200 bytes:

PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00b\xee\x9dh^\x01\x00\x00\x90\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

import csv

def readFile():
    count = 0
    print repr(open("FFA.xlsx", "rb").read(200)) #dump 1st 200 bytes
    with open("FFA.xlsx","rb") as csvfile:
        FFAreader = csv.reader(csvfile, delimiter=",")
        for row in FFAreader:
            idd = row[0]
            name = row[1]
            pos = row[2]
            team = row[3]
            pts = row[4]
            oecr = row[5]
            oR = row[6]
            posR = row[7]
            up = row[8]
            low =row[9]
            risk = row[10]
            swing = row[11]
 readFile()

uh, .xlsx is not a csv file. it's a zip file containing multiple xml and other files. you're reading binary garbage. — Marc B
– Marc B, Commented Aug 20, 2015 at 14:18
okay well like i said i only changed it because it seem to change itself however it was working fine for a while. I changed it back to a .csv but still get the error message. However the first 200 bytes has changed to — Jonathan Holloway
– Jonathan Holloway, Commented Aug 20, 2015 at 14:26
This is the first 200 bytes now 'playerId,playername,position,team,points,overallECR,overallRank,positionRank,upper,lower,risk,swing\r\n100029,49ers,DST,SF,104.896,245,979,16,165.7802,92.0755,4.0319,73.7047\r\n2543501,Aaron Colvin,DB,JAC' — Jonathan Holloway
– Jonathan Holloway, Commented Aug 20, 2015 at 14:26
As an additional suggestion, you should get the call to readFile() method after the usual block for main: ` if name == 'main': readFile() ` — Alberto
– Alberto, Commented Aug 20, 2015 at 14:50

Alberto · Accepted Answer · 2015-08-20 15:23:06Z

1

The code you have posted have a small but dangerous mistake, since you are leaking the file handle by opening it twice.

1) You are opening the file and reading 200 bytes from it, but not closing it. 2) You are then opening the file the proper way, via a context manager, which in fact could read anything from it.

Some questions that may help you to debug the problem:

Is the file you are opening stored in a network'd resource? (CIFS, NFS, etc)
Have you checked the file is not opened by another process? lsof can help you to check that.
Is this running on windows or Linux? Can you test in under linux, if it happens in windows, and viceversa?

I forgot to mention that you should not use CSV for anything related to Excel, even when the file seems to be a CSV data-wise. Use XLRD module (https://pypi.python.org/pypi/xlrd) , it's cross-platform and opens and reads perfectly fine both XSL and XSLX files since version 0.8.

This little piece of code will show you how to open the workbook and parse it in a basic manner:

import xlrd  

def open_excel():  
  with xlrd.open_workbook('FFA.xlsx') as wb:  
      sh = wb.sheet_by_name('Sheet1')  
      for rownum in xrange(sh.nrows):  
        [Do whatever you need here]

edited Aug 20, 2015 at 15:23

answered Aug 20, 2015 at 14:53

Alberto

7277 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Jonathan Holloway Over a year ago

Well I am doing in Windows, and the major reason why I open it the first time was for debugging purposes to see the null bytes thats not in the normal code. I also didn't experience this problem originally. No I can't test in Linux. And it is important to point out after chaning to a .csv file the bytes changed to 'playerId,playername,position,team,points,overallECR,overallRank,positionRank,up‌per,lower,risk,swing\r\n100029,49ers,DST,SF,104.896,245,979,16,165.7802,92.0755,4‌.0319,73.7047\r\n2543501,Aaron Colvin,DB,JAC' this means \r\n is produced in a cell with no data in it at all.

Alberto Over a year ago

Please see my comment since I edited my answer; try with xlrd, it should work straight away.

Jonathan Holloway Over a year ago

so what must be typed to open the file using xlrd?

Alberto Over a year ago

The example below only covers your question about opening the file xlrd. For further information, check the documentation (secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/…) or search here in StackOverflow, there are dozens of questions and answers about using it. I've added the example in the original answer.

Jonathan Holloway Over a year ago

Okay so it prints the row numbers how do i access the values from rows?

John Mulhall · Accepted Answer · 2015-08-20 14:36:35Z

0

I agree with Marc, I did a training exercise importing an excel file and I think pandas library would help in that case where you can import pandas as pd and use pd.read_excel(file_name) as part of a data_processing function like read_file() post import.

answered Aug 20, 2015 at 14:36

John Mulhall

1733 silver badges13 bronze badges

Comments

Jonathan Holloway · Accepted Answer · 2015-08-20 15:12:31Z

0

So this is what I did. But I am intersted in learning the xlrd method i have the module but no documentation. This works no error messages. Still not sure why it changed from .csv to xlsx but its working now. What is the script like in xlrd?

import csv
def readFile():
    count = 0
    #print repr(open("FFA.csv", "rb").read(200)) #dump 1st 200 bytes check if null values produced.
    with open("FFA.csv","rb") as csvfile:
        FFAreader = csv.reader(csvfile, delimiter=",")
        for row in FFAreader:
            idd = row[0]
            name = row[1]
            pos = row[2]
            team = row[3]
            pts = row[4]
            oecr = row[5]
            oR = row[6]
            posR = row[7]
            up = row[8]
            low =row[9]
            risk = row[10]
            swing = row[11]
readFile()

answered Aug 20, 2015 at 15:12

Jonathan Holloway

4345 silver badges12 bronze badges

2 Comments

Alberto Over a year ago

Please check my original answer, I've added the example on how to open the file and perform the minimal parsing. If you are interested on parsing it as csv, you can convert from xslx to CSV using xlrd, and then parse it with CSV. See the answer to this question: stackoverflow.com/questions/17314062/…

Alberto Over a year ago

Sorry, the link above is wrong, this is the right one: stackoverflow.com/questions/9884353/xls-to-csv-convertor

Collectives™ on Stack Overflow

Why is csvreader for Python starting good then producing NULL bytes?

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related