PYTHON: Problems with parsing csv file and finding maximum value [duplicate]

Question

I need to read in the file https://drive.google.com/open?id=0B29hT1HI-pwxMjBPQWFYaWoyalE) however, I have tried 3-4 different code methods and repeatedly get the error: "line contains NULL byte". I read on other threads that this is a problem with your csv-but, this is the file that my professor will be loading and grading me on, and I can't modify it, so I'm looking for a solution around this error.

As I mentioned I've tried several different methods to open the file. Here's my best two:

def largestState(): 
    INPUT  = "statepopulations.csv"
    COLUMN = 5   # 6th column

    with open(INPUT, "rU") as csvFile:
        theFile  = csv.reader(csvFile)
        header = next(theFile, None)    # skip header row
        pop = [float(row[COLUMN]) for row in theFile]

    max_pop = max(pop)
    print max_pop

largestState()

This results in the NULL Byte error. Please ignore the additional max_pop lines. The next step after reading the file in is to find the maximum value of row F.

def test():
with open('state-populations.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row
test()

This results in the NULL Byte Error.

If anyone could offer a simple solution to this problem I'd greatly appreciate it.

File as .txt: https://drive.google.com/open?id=0B29hT1HI-pwxZzhlMGZGVVAzX28

The file seems to be corrupted. If you open it in excel you would see a lot of wild characters — Aks
– Aks, Commented Dec 7, 2016 at 22:14
Upon further review: That seems to be a binary file - or corrupted as Anuj said. AFAIK, the CSV reader only works with text files. I suggest that you tell your professor about the problems you're having with the file. — GreenMatt
– GreenMatt, Commented Dec 7, 2016 at 22:14
@Anuj when I open the file on my virtual machine it looks normal, — Megan Byers
– Megan Byers, Commented Dec 7, 2016 at 22:18

Suku · Accepted Answer · 2016-12-07 22:51:35Z

First of all the "csv" file you have provided via the Google Drive link is NOT a csv file. Its a gzip 'ed xml file.

[~/Downloads] file state-populations.csv
state-populations.csv: gzip compressed data, from Unix

[~/Downloads] gzip -d state-populations.csv
gzip: state-populations.csv: unknown suffix -- ignored

[~/Downloads] mv state-populations.csv state-populations.csv.gz

[~/Downloads] gzip -d state-populations.csv.gz

[~/Downloads] ls state-populations.csv
state-populations.csv
[~/Downloads] file state-populations.csv
state-populations.csv: XML 1.0 document text, ASCII text, with very long lines

You can use some xml module to parse it

[~/Downloads] python
Python 2.7.10 (default, Jul 30 2016, 18:31:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> import xml
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('state-populations.csv')
>>> root = tree.getroot()
>>> root
<Element '{http://www.gnumeric.org/v10.dtd}Workbook' at 0x10ded51d0>
>>> root.tag
'{http://www.gnumeric.org/v10.dtd}Workbook'
>>> for child in root:
...     print child.tag, child.attrib
...
{http://www.gnumeric.org/v10.dtd}Version {'Epoch': '1', 'Full': '1.12.9', 'Major': '12', 'Minor': '9'}
{http://www.gnumeric.org/v10.dtd}Attributes {}
{urn:oasis:names:tc:opendocument:xmlns:office:1.0}document-meta {'{urn:oasis:names:tc:opendocument:xmlns:office:1.0}version': '1.2'}
{http://www.gnumeric.org/v10.dtd}Calculation {'ManualRecalc': '0', 'MaxIterations': '100', 'EnableIteration': '1', 'IterationTolerance': '0.001', 'FloatRadix': '2', 'FloatDigits': '53'}
{http://www.gnumeric.org/v10.dtd}SheetNameIndex {}
{http://www.gnumeric.org/v10.dtd}Geometry {'Width': '864', 'Height': '322'}
{http://www.gnumeric.org/v10.dtd}Sheets {}
{http://www.gnumeric.org/v10.dtd}UIData {'SelectedTab': '0'}

Aks · Accepted Answer · 2016-12-08 02:47:41Z

0

The new .txt file looks good, and your function largestState() gives the correct output. Just have return instead of print instead in the end.

def largestState(): 
    INPUT  = "state-populations.txt"
    COLUMN = 5   # 6th column

    with open(INPUT, "rU") as csvFile:
        theFile  = csv.reader(csvFile)
        header = next(theFile, None)    # skip header row
        pop = [float(row[COLUMN]) for row in theFile]

    max_pop = max(pop)
    return(max_pop)

largestState()

answered Dec 8, 2016 at 2:47

Aks

9622 gold badges17 silver badges32 bronze badges

Collectives™ on Stack Overflow

PYTHON: Problems with parsing csv file and finding maximum value [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related