Regex Pattern using bracket '[]'

Question

How can I separate
3[a]2[b4[F]c] into 3[a] and 2[b4[F]c]
OR
3[a]2[bb] into 3[a] and 2[bb] using re.split?

I try the following pattern:

(\d+)\[(.*?)\]

but the output gives me 3a and 2b4[F".

For the case of arbitrarily nested brackets, you should consider writing a parser. — Tim Biegeleisen
– Tim Biegeleisen, Commented Jan 10, 2020 at 12:36

The fourth bird · Accepted Answer · 2020-01-10 12:59:10Z

1

If you want to use split, you might assert what is on the left is a ] and on the right is a digit:

(?<=])(?=\d)

Example code

import re

regex = r"(?<=])(?=\d)"
strings = [
    "3[a]2[b4[F]c]",
    "3[a]2[bb]"
]

for s in strings:
    print (re.split(r'(?<=])(?=\d)', s))

Output

['3[a]', '2[b4[F]c]']
['3[a]', '2[bb]']

answered Jan 10, 2020 at 12:59

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2020-01-10 13:13:12Z

1

You can't do that with re.split since re does not support recursion.

You may match and extract numbers that are followed with nested square brackets using PyPi regex module:

import regex
s = "3[a]2[b4[F]c]"
print( [x.group() for x in regex.finditer(r'\d+(\[(?:[^][]++|(?1))*])', s)] )
# => ['3[a]', '2[b4[F]c]']

Pattern details

answered Jan 10, 2020 at 13:13

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges