I'm getting CSV formatted data piped from an external source, and the data should not be written to a file, that would open a number of different maintenance tasks I would like to avoid. I am getting the data as a string.
So now I want to interpret the data as CSV. The python module csv is suited for that, so I am using it.
However, constructing a CSV parser using csv.reader (or dictreader) requires the data to support an iterator. A list or a reader would already do that. However, my data contains line breaks (\n) inside quotes strings (which is totally fine for the CSV format), and also line breaks as record end indicators/line terminators.
If I construct it like this:
csvreader = csv.reader(data.split('\n'))
then I have already broken the valid CSV formatting by interpreting the line breaks as record end indicators, even if they were inside quoted strings. For me, the interpretation of the line breaks should be done by the CSV parser, as it is aware of quotes fields containing CSV relevant formatting characters like line breaks.
Do I have a misunderstanding with the way the csv package parser works? Or how would I parse a CSV type string using the CSV parser while the data contains line breaks that do not indicate the end of a record?
Edit 1: It might not have been wise to add context to my question, as the context seems to have overtaken my actual question/problem. To clarify, my problem is not parsing CSV from memory. I have found StringIO already. When using StringIO, all that does is use the whole string/file as the first record.
Edit 2: Maybe sample data will help. My data looks like this:
"value1";"value2";"value3
stillvalue3"
"value4";"value5";"value6"
So inside the 3rd entry in the first record I have a line break character, which should be interpreted as part of the third entry in the first record.
Edit 3: The way I currently use the parser is this:
csvreader = csv.reader(StringIO(result), quotechar='"', delimiter=';', lineterminator='\n', escapechar='"')
because the spec says that a double quote (") inside a field is escaped by a quote char (""), so the data might look like this:
"value1";"value2";"value3
stillvalue3
a ""quote"" inside the quote"
"value4";"value5";"value6"
It seems that using these settings the StringIO input always results in one record with all the data from the string inside it. Removing the
escapechar='"'
fixes this problem, however, I cannot support escaped double quote characters inside the quotes. So the nature of my question changed. I'll keep this open if anybody has an idea and will accept the answer of Martin Evans later.
StringIO(nowfrom io import StringIO) as shown here: stackoverflow.com/a/18724978/3001761escapechar) a csv.reader reads correctly the only data you have shown. So we have to guess what the problem can be. You have to show some data exhibiting the problem, with what you get and what you want.