I am trying to understand re.split() function with non-capturing group to split a comma delimited string.
This is my code:
pattern = re.compile(r',(?=(?:"[^"]*")*[^"]*$)')
text = 'qarcac,"this is, test1",123566'
results= re.split(pattern, text)
for r in results:
print(r.strip())
When I execute this code, the results are as expected.
split1: qarcac
split2: "this is, test1"
split3: 123566
whereas if i add one more double quoted string to the source text, it doesn't work as expected.
text = 'qarcac,"this is, test1","this is, test2", 123566, testdata'
and produces the below output
split1: qarcac,"this is, test1"
split2: "this is, test2"
split3: 123566
Can someone explain me what's going on here and how non-capturing group works differently in these two cases?
csvmodule to parse CSV string. The regex you are using is very inefficient, and in case the string is very long, the performance might drop significantly.,(?=(?:"[^"]*"|[^"])*$). Or,(?=[^"]*(?:"[^"]*"[^"]*)*$). See Regex to pick commas outside of quotes.