4

I have a string called group_data, which I want to read with Python's csv.reader. This is the call I am making.

group = csv.reader(group_data.split('\n'), delimiter=';', 
                   doublequote=True, quoting=csv.QUOTE_ALL, strict=True)

I want that this raises an exception whenever one of the following is the case:

  • There is a single quote: "A";"B "bb" B";"C" instead of "A";"B ""bb"" B";"C")
  • Any of the fields is not quoted: A;B;C instead of "A";"B";"C"

However, the excerpt above accepts both lines as correct, even with the doublequote=True, quoting=csv.QUOTE_ALL, and strict=True settings. Is there another option I should set to make it fail? If this is not possible, is there another way to quickly notice if there is a single quote or an unquoted field?

1
  • pandas has a csv reader. It may be worth checking if their reader is a little more strict. Commented May 18, 2015 at 16:25

1 Answer 1

2

For what it's worth, it looks like Python 3.4 does reject your first example:

In [8]: mkreader = lambda x: csv.reader(x.split("\n"), delimiter=";", doublequote=True, quoting=csv.QUOTE_ALL, strict=True)
In [11]: for l in mkreader('''"A";"B ""bb"" B";"C"'''): print(l)
['A', 'B "bb" B', 'C']
In [12]: for l in mkreader('''"A";"B "bb" B";"C"'''): print(l)
...
Error: ';' expected after '"'

Although it allows the second:

In [13]: for l in mkreader('''A;B;C'''): print(l)
['A', 'B', 'C']

Looking at the docs, it seems like this is because QUOTE_ALL is strictly a writer setting, not a reader setting:

csv.QUOTE_ALL
    Instructs writer objects to quote all fields.

Compare to:

csv.QUOTE_NONNUMERIC

    Instructs writer objects to quote all non-numeric fields.

    Instructs the reader to convert all non-quoted fields to type float.

So it looks like you need to make this yourself if you want it. This is simple if you know that ; will never appear inside your rows (which appears to be the case, since you don't set escapechar):

In [19]: def check_line(line):
    for word in line.split(';'):
        if word[0] != '"' or word[-1] != '"':
            raise csv.Error("Bad input.")
In [20]: check_line("A;B;C")
...
Error: Bad input.
Sign up to request clarification or add additional context in comments.

2 Comments

I noticed that the reader command itself did not raise an error in the first case in Python 3.4, but when you then use that reader in a for loop, it does. I guess I have to build this second check manually. Thanks for your help!
Hmm, this seems to be more complicated than I hoped for, since I have to basically implement my own csv reader for it (split for newlines, then split for semicolons, then check quotes).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.