2

How can I separate
3[a]2[b4[F]c] into 3[a] and 2[b4[F]c]
OR
3[a]2[bb] into 3[a] and 2[bb] using re.split?

I try the following pattern:

(\d+)\[(.*?)\]

but the output gives me 3a and 2b4[F".

3
  • 2
    For the case of arbitrarily nested brackets, you should consider writing a parser. Commented Jan 10, 2020 at 12:36
  • 1
    Did you mean like this (?<=])(?=\d) ? ideone.com/podkLl Commented Jan 10, 2020 at 12:43
  • Exactly like that. Thanks! Commented Jan 10, 2020 at 12:53

2 Answers 2

1

If you want to use split, you might assert what is on the left is a ] and on the right is a digit:

(?<=])(?=\d)

Regex demo | Python demo

Example code

import re

regex = r"(?<=])(?=\d)"
strings = [
    "3[a]2[b4[F]c]",
    "3[a]2[bb]"
]

for s in strings:
    print (re.split(r'(?<=])(?=\d)', s))

Output

['3[a]', '2[b4[F]c]']
['3[a]', '2[bb]']
Sign up to request clarification or add additional context in comments.

Comments

1

You can't do that with re.split since re does not support recursion.

You may match and extract numbers that are followed with nested square brackets using PyPi regex module:

import regex
s = "3[a]2[b4[F]c]"
print( [x.group() for x in regex.finditer(r'\d+(\[(?:[^][]++|(?1))*])', s)] )
# => ['3[a]', '2[b4[F]c]']

See the online Python demo

Pattern details

  • \d+ - 1+ digits
  • (\[(?:[^][]++|(?1))*]) - Group 1:
    • \[ - a [ char
    • (?:[^][]++|(?1))* - 0 or more sequences of
    • [^][]++ - 1+ chars other than [ and ] (possessively for better performance)
    • | - or
    • (?1) - a subroutine triggering Group 1 recursion at this location
  • ] - a ] char.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.