1

Basically what I am trying to accomplish is whenever a number precedes a string within square brackets, the string within the square brackets should repeat as many times. I am learning regular expression and I posted a question how I can come up with a regular expression for this scenario and folks were kind enough to get me what I wanted. Basically the code below parses the string based on the regular expression and get me a list with all square brackets removed so that I can iterate through the list and get myself the output I want.
Given a string s, the code I have is below. But this does not work in all scenarios. This however works when there are no nested square brackets.

import re
s = "abc3[cd]xyz"
s_final = ""
res = re.findall(r'\d+|[^\W\d_]+', s)
print(res)
for i in range(len(res)):
    if res[i].isnumeric():
        s_final+=int(res[i])*res[i+1]
    else:
        if res[i] not in s_final:
            s_final += res[i]
print(s_final)

if I change the string to

s = "3[a2[c]]" the expected output is accaccacc as the inner square brackets gets evaluated first and then the outer square brackets. But what I get is aaacc which is not wrong based on the code I have. But anyone have any insights on how I can evaluate the string correctly?

Thanks!

3
  • 1
    I guess you would need recursion, or at least recursion might be helpful. Commented Jul 4, 2021 at 19:23
  • regex is modelled after regular expressions/regular languages, which aren't suitable for nested parentheses / recursion. While the various practical implementations of regex today do offer extensions, it is still going to be cumbersome. Why not use a proper parser, e.g. PyParsing? Commented Jul 4, 2021 at 19:23
  • Regex isn't recursive (well, this flavor anyway), but the problem is. A stack (or explicit recursion) is the easiest way to solve this sort of problem. If you hit [, push and increment the repeater value by the preceding number, if you hit ] pop back to the parent level's repeater value. See solutions in leetcode.com/problems/decode-string Commented Jul 4, 2021 at 19:24

3 Answers 3

1

You can use recursion to build the full string, after first parsing the characters:

import re
s = "3[a2[c]]"
def build_string(d):
  while (n:=next(d, None)) is not None and n != ']':
     yield n if not (j:=re.findall('\d+$', n)) else n[:-1*len(j[0])]
     if j:
        _ = next(d)
        yield ''.join(build_string(d))*int(j[0])

def get_result(s):
   return ''.join(build_string(iter(re.findall('\w+|\[|\]', s))))
      
print(get_result(s))

Output:

'accaccacc'
Sign up to request clarification or add additional context in comments.

Comments

1

Regardless of what parsing tools are used, I always recommend that developers start by writing a BNF (Backus-Naur Form) for their parser, as a roadmap for the development to follow. Here is what I've gathered from your problem description (where '+' means "one or more"):

string_expr := (letter | repetition)+
repetition := integer '[' string_expr ']'
letter := character a-z
integer := (character 0-9)+

At this point, walk through the BNF with your test samples, and make sure things look correct.

You can see that this is a recursive grammar, since string_expr uses repetition in its definition, but repetition uses string_expr.

Pyparsing is written to make it easy to map these BNF definitions to pyparsing objects. Follow this annotated code: (need to work bottom-up in implementing your grammar)

import pyparsing as pp

# define []'s, but we'll use pyparsing's Suppress since they won't be
# needed in the actual expression after parsing is done
LBRACK, RBRACK = map(pp.Suppress, "[]")

# a letter will be a single character from the list of lowercase letters
# defined in the Python string module
letter = pp.Char(string.ascii_lowercase)

# an integer will be one or more numeric digits - attach a parse
# action to convert parsed numeric strings into actual Python ints
integer = pp.Word("0123456789").addParseAction(lambda t: int(t[0]))

# we would like to write 'repetition' now, but we haven't defined
# 'string_expr' yet. We can't define that yet because 'repetition' isn't
# defined yet. So let's define 'string_expr' using a Pyparsing 
# placeholder - a Forward
string_expr = pp.Forward()
repetition = (integer("multiplier") 
              + LBRACK 
              + string_expr("content") 
              + RBRACK)

# now that repetition is defined, we can "insert" the definition of 
# 'string_expr' into the placeholder we created earlier
string_expr <<= (letter | repetition)[...]

# two more parse actions needed:
# - one to do the multiplication of multiplier*content in repetition
# - one to join all the pieces together in string_expr
repetition.addParseAction(lambda t: t.multiplier * list(t.content))
string_expr.addParseAction("".join)

Now test it out using the runTests() method pyparsing defines on parser elements:

string_expr.runTests("""\
    abc3[cd]xyz
    3[a2[c]]
    3[a2[cb]]
    """)

Gives:

abc3[cd]xyz
['abccdcdcdxyz']

3[a2[c]]
['accaccacc']

3[a2[cb]]
['acbcbacbcbacbcb']

Comments

0

You can use

import re
s = "3[a2[c]]"
s_final = s
rx = r'(\d+)\[([^][]*)]'
n = 1 
while n:
    s_final, n = re.subn(rx, lambda x: x.group(2) * int(x.group(1)), s_final)

print(s_final) # => accaccacc

See the Python demo.

The (\d+)\[([^][]*)] regex matches and captures into Group 1 any one or more digits, and then matches a [ char, captures into Group 2 any text other than [ and ] chars up to the closest ] char.

The n = 1 is a flag that makes re.subn run at least once.

Then, the while block is executed to find all the occurrences of the pattern and replace with Group 1 value repeated Group 2 times.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.