30

I am very new to Python. I want to parse a csv file such that it will recognize quoted values - for example

1997,Ford,E350,"Super, luxurious truck"

should be split as

('1997', 'Ford', 'E350', 'Super, luxurious truck')

and NOT

('1997', 'Ford', 'E350', '"Super', ' luxurious truck"')

the above is what I get if I use something like str.split(,).

How do I do this? Also would it be best to store these values in an array or some other data structure? because after I get these values from the csv I want to be able to easily choose, lets say any two of the columns and store it as another array or some other data structure.

2
  • I have edited the question. If I use just the delimiter ',' it does not recognize the ',' within the quotes Commented Sep 6, 2012 at 9:20
  • You have to define the 'quote' Commented Sep 6, 2012 at 9:46

5 Answers 5

29

You should use the csv module:

import csv
reader = csv.reader(['1997,Ford,E350,"Super, luxurious truck"'], skipinitialspace=True)
for r in reader:
    print r

output:

['1997', 'Ford', 'E350', 'Super, luxurious truck']
Sign up to request clarification or add additional context in comments.

3 Comments

thanks. But when I try reading from the file I get the following error- csv.Error: line contains NULL byte my file contains probably a million lines such as the following - 1,,"Warn, unknown error","car-8554.gif","car.gif","crs_04","change rand str, cut pos, 35289, add size, 9242"
@cornerstone: how are you creating this file? You're not going to get the NULL byte to display here via normal means, but if it's in the file it's going to be a problem to read it via nearly any means if you treat it as text.
@Wooble it was created by dumping SQL data values to csv file. I figured the null value is due to the successive ",," present in the lines.. I have figured a solution for that. with open(r'car.csv') as csv_file: ... reader = csv.reader((line.replace('\0','') for line in csv_file), delimiter=',', quotechar='"') ... print(reader.next())
22

The following method worked perfectly

d = {}
d['column1name'] = []
d['column2name'] = []
d['column3name'] = []

dictReader = csv.DictReader(open('filename.csv', 'rb'), fieldnames = ['column1name', 'column2name', 'column3name'], delimiter = ',', quotechar = '"')

for row in dictReader:
    for key in row:
        d[key].append(row[key])

The columns are stored in dictionary with the column names as the key.

2 Comments

there is a missing begin single quote in front of col3name.
What happens to the opened file handle?
5

You have to define the doublequote as the quotechar whithin the csv.reader() statement:

>>> with open(r'<path_to_csv_test_file>') as csv_file:
...     reader = csv.reader(csv_file, delimiter=',', quotechar='"')
...     print(reader.next())
... 
['1997', 'Ford', 'E350', 'Super, luxurious truck']
>>> 

Comments

3

If you don't want to use the CSV module you need to use a regular expression. Try this:

import re
regex = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"
string = '1997,Ford,E350,"Super, luxurious truck"'
array = re.split(regex, string)

print(array[3])
"Super, luxurious truck"

Comments

0

The csv.py module is probably fine - but if you want to see and/or control how it works, here is a small python only solution based on a coroutine:

def csv_parser(delimiter=','):
    field = []
    while True:
        char = (yield(''.join(field)))
        field = []

        leading_whitespace = []    
        while char and char == ' ':
            leading_whitespace.append(char)
            char = (yield)

        if char == '"' or char == "'":
            suround = char
            char = (yield)
            while True:
                if char == suround:
                    char = (yield)
                    if not char == suround:
                        break

                field.append(char)
                char = (yield)

            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                char = (yield)
        else:
            field = leading_whitespace
            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                field.append(char)
                char = (yield)

def parse_csv(csv_text):
    processor = csv_parser()
    processor.next() # start the processor coroutine

    split_result = []
    for c in list(csv_text) + [None]:
        emit = processor.send(c)
        if emit:
            split_result.append(emit)

    return split_result

print parse_csv('1997,Ford,E350,"Super, luxurious truck"')

Tested on python 2.7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.