0

OKay so I am reading an excel workbook. I read the file for a while and it started off a .csv after debugging and doing other things below the code i am showing you it changed to a xlsx I started getting IOError no such file or directory. I figured out why and changed FFA.csv to FFA.xlsx and it worked error free. Then I started doing other things and debugging. Got up this morning and now i get the following Error : line contains NULL byte. weird because the code started out good. Now it can't read. I put in the print repr() to debug and it infact now prints NULL bytes. So how do i fix this and prevent it in the future? here is the 1st 200 bytes:

PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00b\xee\x9dh^\x01\x00\x00\x90\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

import csv

def readFile():
    count = 0
    print repr(open("FFA.xlsx", "rb").read(200)) #dump 1st 200 bytes
    with open("FFA.xlsx","rb") as csvfile:
        FFAreader = csv.reader(csvfile, delimiter=",")
        for row in FFAreader:
            idd = row[0]
            name = row[1]
            pos = row[2]
            team = row[3]
            pts = row[4]
            oecr = row[5]
            oR = row[6]
            posR = row[7]
            up = row[8]
            low =row[9]
            risk = row[10]
            swing = row[11]
 readFile()
6
  • uh, .xlsx is not a csv file. it's a zip file containing multiple xml and other files. you're reading binary garbage. Commented Aug 20, 2015 at 14:18
  • okay well like i said i only changed it because it seem to change itself however it was working fine for a while. I changed it back to a .csv but still get the error message. However the first 200 bytes has changed to Commented Aug 20, 2015 at 14:26
  • This is the first 200 bytes now 'playerId,playername,position,team,points,overallECR,overallRank,positionRank,upper,lower,risk,swing\r\n100029,49ers,DST,SF,104.896,245,979,16,165.7802,92.0755,4.0319,73.7047\r\n2543501,Aaron Colvin,DB,JAC' Commented Aug 20, 2015 at 14:26
  • export the file as CSV, then process the CSV file. Commented Aug 20, 2015 at 14:35
  • As an additional suggestion, you should get the call to readFile() method after the usual block for main: ` if name == 'main': readFile() ` Commented Aug 20, 2015 at 14:50

3 Answers 3

1

The code you have posted have a small but dangerous mistake, since you are leaking the file handle by opening it twice.

1) You are opening the file and reading 200 bytes from it, but not closing it. 2) You are then opening the file the proper way, via a context manager, which in fact could read anything from it.

Some questions that may help you to debug the problem:

  • Is the file you are opening stored in a network'd resource? (CIFS, NFS, etc)
  • Have you checked the file is not opened by another process? lsof can help you to check that.
  • Is this running on windows or Linux? Can you test in under linux, if it happens in windows, and viceversa?

I forgot to mention that you should not use CSV for anything related to Excel, even when the file seems to be a CSV data-wise. Use XLRD module (https://pypi.python.org/pypi/xlrd) , it's cross-platform and opens and reads perfectly fine both XSL and XSLX files since version 0.8.

This little piece of code will show you how to open the workbook and parse it in a basic manner:

import xlrd  

def open_excel():  
  with xlrd.open_workbook('FFA.xlsx') as wb:  
      sh = wb.sheet_by_name('Sheet1')  
      for rownum in xrange(sh.nrows):  
        [Do whatever you need here]  
Sign up to request clarification or add additional context in comments.

5 Comments

Well I am doing in Windows, and the major reason why I open it the first time was for debugging purposes to see the null bytes thats not in the normal code. I also didn't experience this problem originally. No I can't test in Linux. And it is important to point out after chaning to a .csv file the bytes changed to 'playerId,playername,position,team,points,overallECR,overallRank,positionRank,up‌​per,lower,risk,swing\r\n100029,49ers,DST,SF,104.896,245,979,16,165.7802,92.0755,4‌​.0319,73.7047\r\n2543501,Aaron Colvin,DB,JAC' this means \r\n is produced in a cell with no data in it at all.
Please see my comment since I edited my answer; try with xlrd, it should work straight away.
so what must be typed to open the file using xlrd?
The example below only covers your question about opening the file xlrd. For further information, check the documentation (secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/…) or search here in StackOverflow, there are dozens of questions and answers about using it. I've added the example in the original answer.
Okay so it prints the row numbers how do i access the values from rows?
0

I agree with Marc, I did a training exercise importing an excel file and I think pandas library would help in that case where you can import pandas as pd and use pd.read_excel(file_name) as part of a data_processing function like read_file() post import.

Comments

0

So this is what I did. But I am intersted in learning the xlrd method i have the module but no documentation. This works no error messages. Still not sure why it changed from .csv to xlsx but its working now. What is the script like in xlrd?

import csv
def readFile():
    count = 0
    #print repr(open("FFA.csv", "rb").read(200)) #dump 1st 200 bytes check if null values produced.
    with open("FFA.csv","rb") as csvfile:
        FFAreader = csv.reader(csvfile, delimiter=",")
        for row in FFAreader:
            idd = row[0]
            name = row[1]
            pos = row[2]
            team = row[3]
            pts = row[4]
            oecr = row[5]
            oR = row[6]
            posR = row[7]
            up = row[8]
            low =row[9]
            risk = row[10]
            swing = row[11]
readFile()

2 Comments

Please check my original answer, I've added the example on how to open the file and perform the minimal parsing. If you are interested on parsing it as csv, you can convert from xslx to CSV using xlrd, and then parse it with CSV. See the answer to this question: stackoverflow.com/questions/17314062/…
Sorry, the link above is wrong, this is the right one: stackoverflow.com/questions/9884353/xls-to-csv-convertor

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.