How can I separate
3[a]2[b4[F]c] into 3[a] and 2[b4[F]c]
OR
3[a]2[bb] into 3[a] and 2[bb] using re.split?
I try the following pattern:
(\d+)\[(.*?)\]
but the output gives me 3a and 2b4[F".
If you want to use split, you might assert what is on the left is a ] and on the right is a digit:
(?<=])(?=\d)
Example code
import re
regex = r"(?<=])(?=\d)"
strings = [
"3[a]2[b4[F]c]",
"3[a]2[bb]"
]
for s in strings:
print (re.split(r'(?<=])(?=\d)', s))
Output
['3[a]', '2[b4[F]c]']
['3[a]', '2[bb]']
You can't do that with re.split since re does not support recursion.
You may match and extract numbers that are followed with nested square brackets using PyPi regex module:
import regex
s = "3[a]2[b4[F]c]"
print( [x.group() for x in regex.finditer(r'\d+(\[(?:[^][]++|(?1))*])', s)] )
# => ['3[a]', '2[b4[F]c]']
See the online Python demo
Pattern details
\d+ - 1+ digits(\[(?:[^][]++|(?1))*]) - Group 1:
\[ - a [ char(?:[^][]++|(?1))* - 0 or more sequences of[^][]++ - 1+ chars other than [ and ] (possessively for better performance)| - or(?1) - a subroutine triggering Group 1 recursion at this location] - a ] char.
(?<=])(?=\d)? ideone.com/podkLl