You can split it, then reduce it to put together the elements that have an odd number of " :
txt = 'abcd efgh ijk\n1234 567"qqqq\n---" 890\n'
s = txt.split('\n')
reduce(lambda x, y: x[:-1] + [x[-1] + '\n' + y] if x[-1].count('"') % 2 == 1 else x + [y], s[1:], [s[0]])
# ['abcd efgh ijk', '1234 567"qqqq\n---" 890', '']
Explication:
if x[-1].count('"') % 2 == 1
# If there is an odd number of quotes to the last handled element
x[:-1] + [x[-1] + y]
# Append y to this element
else x + [y]
# Else append the element to the handled list
Can also be written like so:
def splitWithQuotes(txt):
s = txt.split('\n')
res = []
for item in s:
if res and res[-1].count('"') % 2 == 1:
res[-1] = res[-1] + '\n' + item
else:
res.append(item)
return res
splitWithQuotes(txt)
# ['abcd efgh ijk', '1234 567"qqqq\n---" 890', '']
As pointed out by @Veedrac, this is O(n^2), but this can be prevented by keeping track of the count of ":
def splitWithQuotes(txt):
s = txt.split('\n')
res = []
cnt = 0
for item in s:
if res and cnt % 2 == 1:
res[-1] = res[-1] + '\n' + item
else:
res.append(item)
cnt = 0
cnt += item.count('"')
return res
splitWithQuotes(txt)
# ['abcd efgh ijk', '1234 567"qqqq\n---" 890', '']
(The last empty string is because of the last \n at the end of the input string.)
foo"bar"oh"what?