3

I need to parse a string using a CSV parser. I've found this solution in many places, but it doesn't work for me. I was using Python 3.4, now I changed it to 2.7.9 and still nothing...

import csv
import StringIO

csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file)
for data in csv_reader:
      # do something

Could anyone please suggest me another way to parse this string using a CSV parser? Or how can I make this work?

Obs: I have a string in a CSV format, with fields that have commas inside, that's why I can't parse it in the standard way.

1
  • Please read the guide on how to construct an MCVE; you didn't even explain what isn't working. Commented Mar 21, 2015 at 3:28

2 Answers 2

3

You need to put double quotes around elements that contain commas.

The CSV format implements RFC 4180, which states:

  1. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.

So for instance (run code here.):

import StringIO
import csv

# the text between double quotes will be treated 
# as a single element and not parsed by commas
line = '1,2,3,"1,2,3",4'

csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file)
for data in csv_reader:
    # output: ['1', '2', '3', '1,2,3', '4']
    print data

As another option, you can change the delimiter. The default for csv.reader is delimiter=',' and quotechar='"' but both of these can be changed depending on your needs.

Semicolon Delimiter:

line = '1;2;3;1,2,3;4'

csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file, delimiter=';')
for data in csv_reader:
    # output: ['1', '2', '3', '1,2,3', '4']
    print data

Vertical Bar Quotechar

line = '1,2,3,|1,2,3|,4'

csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file, quotechar='|')
for data in csv_reader:
    # output: ['1', '2', '3', '1,2,3', '4']
    print data

Also, the python csv module works on python 2.6 - 3.x, so that shouldn't be the problem.

Sign up to request clarification or add additional context in comments.

Comments

0

The obvious solution that jumps out of the page, rather than reimplementing CSV parsing, is to preprocess the data first and replace all of the commas within strings by some never used token character (or even the word COMMA), then feeding that into the CSV parser, and then going back through the data and replacing the tokens back with commas.

Sorry, I've not tried this myself in Python, but I had issues with quotes in my data in another language, and that's how I solved it.

Also, Bcorso's answer is much more complete. Mine is just a quick hack to get around a common limitation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.