I have data in the following format:
string1='<id1> <id2> "abc <id3> ".'
string2='<id_4> <id_5> <id_6>.'
I want to split this into: (<id1>,<id2>, "abc <id3> ") and (<id_4>, <id_5>, <id_6>). I tried: re.split('(?<=)\s+(?=<)',string1) but it incorrectly splits string1 into (<id1>,<id2>,"abc <id3>"). (Although it splits string2 correctly as desired).
How can I correctly split such that it splits on <> but does not split when <> is in quotes.
The delimiters here are <> and "". If we find < then we try to find >. And if we find " then we try to find ". For string 1(string1=' "abc ".'): I start with < ..find id1 and find closing angle bracket, then I find < and try to find closing angle bracket > i.e. id2, then start with " and try to find the " before dot i.e. "abc "
re.findall, it's more easy.|(a logical OR) to separate the two different subpatterns. Keep in mind that the regex engine tests the pattern for each positions in the string from left to right. So if an angle bracket is found one subpattern succeeds, if a double quote is found the other subpattern succeeds.