4

I need your help with following regex. I have a text

"[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."

using regex I want to get

[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]

the following rexeg (\[[^\[$\]\]]*\])

gives me [Hello|Hi] [inviting | calling] [junior| mid junior]

so how should I fix it to get correct output?

3
  • The re module doesn't support regex recursion, which is needed for this kind of task. You might want to take a look at pypi.python.org/pypi/regex Commented Oct 28, 2016 at 6:42
  • Most implementations of regular expressions aren't up to the task of parsing nested expressions: stackoverflow.com/questions/6751105/… PCRE is an extension to regular expressions, which is why the PCRE "regex" solution below looks nothing like the regular expression grammar you're used to. Commented Oct 28, 2016 at 7:07
  • The solution u accepted will work only for 3 levels.its not a generic solution Commented Oct 28, 2016 at 7:13

3 Answers 3

3

Let's define your string and import re:

>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> import re

Now, try:

>>> re.findall(r'\[ (?:[^][]* \[ [^][]* \])* [^][]*  \]', s, re.X)
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']

In more detail

Consider this script:

$ cat script.py
import re
s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."

matches = re.findall(r'''\[       # Opening bracket
        (?:[^][]* \[ [^][]* \])*  # Zero or more non-bracket characters followed by a [, followed by zero or more non-bracket characters, followed by a ]
        [^][]*                    # Zero or more non-bracket characters
        \]                        # Closing bracket
        ''',
        s,
        re.X)
print('\n'.join(matches))

This produces the output:

$ python script.py
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
Sign up to request clarification or add additional context in comments.

2 Comments

OP is asking for nested brackets, as soon as you add a third level this won't work anymore.
The extension to three levels is obvious. If he were to need arbitrarily deep nesting, that would be an issue. The OP may want to clarify.
2

You can use a simple stack to do this instead of recursive regex

x="[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer.[sd[sd[sd][sd]]]"
l=[]
st=[]
start=None
for i,j in enumerate(x):
    if j=='[':
        if j not in st:
            start = i
        st.append(j)
    elif j==']':
        st.pop()
        if not st:
            l.append(x[start:i+1])
print l

Ouput: ['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]', '[sd[sd[sd][sd]]]']

Comments

1

You may use the following code with PyPi regex module with a PCRE-like r'\[(?:[^][]++|(?R))*]' regex:

>>> import regex
>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> r = regex.compile(r'\[(?:[^][]++|(?R))*]')
>>> print(r.findall(s))
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
>>> 

See the regex demo.

The \[(?:[^][]++|(?R))*] matches a [, then zero or more sequences of 1+ chars other than ] and [ OR the whole bracketed expression [...], and then a closing ].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.