2

I'm trying to parse simple string with csv module:

s='param="(a, b)", param2, param3'
list(csv.reader([s], skipinitialspace=True))

it splits to the

[['param="(a', 'b)"', 'param2', 'param3']]

but I'd like to get

[['param="(a, b)"', 'param2', 'param3']]

It seems that for the csv module quoted text may be the whole tooken only.

How to make what I want correctly?

Note: this is not a duplicate of Splitting with commas because in this case, each field is not quoted, only a part within the field. The answer(s) posted at that link (and the link to which that question is a duplicate) do not apply in this case, as evidenced by the above code (which recreates the same structure as the posted answers, and shows that it fails).

3
  • 1
    Not sure how it's a duplicate when the OP's question isn';t answered by that question. Try the above code and see that it doesn't work. What did work for me, as messy as this is, is to quote every entry; but if that isn't how the csv is set up, than that doesn't do much good. s = '"param=\'(a, b)\'", "param2", "param3"' gives the desired result (len(items[0])=3, but again, maybe not helpful. Commented Jun 11, 2015 at 16:12
  • I think your title is wrong. You don't have a valid quoted CSV file, but a CSV file with quotation marks in the fields. That's why the answer can just be, make your own parser for your own format instead using a standard parser for a standard format. Commented Jun 12, 2015 at 8:00
  • So... Let's talk about valid CVSs.... Imagine file with comma separated values. Python csv module doc says that fields can be unqouted. One of values contains comma in the quoted string. How to parse it with python's csv module? Commented Jun 12, 2015 at 8:19

1 Answer 1

2

Unfortunately the csv module doesn't handle text it considers inappropriately quoted very well, or so it seems. One option would be to fall back on regex, something like

>>> s = 'param="(a, b)", param2, param3'
>>> re.findall(r'\s*((?:[^,\"]|\"[^\"]*\")+)\s*', s)
['param="(a, b)"', 'param2', 'param3']
>>> s = 'param="(a, b)" "more quotes" "yet,more,quotes", param2, param3'
>>> re.findall(r'\s*((?:[^,\"]|\"[^\"]*\")+)\s*', s)
['param="(a, b)" "more quotes" "yet,more,quotes"', 'param2', 'param3']

(It would be much better if you could start from a better-formatted initial string, so if you can control that it would be a much better approach.)

Sign up to request clarification or add additional context in comments.

1 Comment

Are the slashes before the double quotes necessary in your regexes?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.