Python regex string expansion

Question

Suppose I have the following string:

trend  = '(A|B|C)_STRING'

I want to expand this to:

A_STRING
B_STRING
C_STRING

The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)

would expand to

STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D

I also want to cover the case of an empty conditional:

(|A_)STRING would expand to:

A_STRING
STRING

Here's what I've tried so far:

def expandOr(trend):
    parenBegin = trend.index('(') + 1
    parenEnd = trend.index(')')
    orExpression = trend[parenBegin:parenEnd]
    originalTrend = trend[0:parenBegin - 1]
    expandedOrList = []

    for oe in orExpression.split("|"):
        expandedOrList.append(originalTrend + oe)

But this is obviously not working.

Is there any easy way to do this using regex?

You realize you're discarding everything after the closing parenthesis, right? Do you not see a way to fix that? — jwodder
– jwodder, Commented Nov 19, 2013 at 1:17
Not sure what you mean. The code works for the case where the parentheses come at the end the of the string. i.e. STRING_(A|B) — Mark Kennedy
– Mark Kennedy, Commented Nov 19, 2013 at 1:33
Right, the code works there because there's nothing after the parentheses to discard, but if you input FOO_(A|B)_BAR, you get FOO_A and FOO_B, with the _BAR being discarded. Do you not realize that this is what's wrong with your code? Do you not see how you forgot to handle the substring after the )? — jwodder
– jwodder, Commented Nov 19, 2013 at 1:38
More answers to this question here: stackoverflow.com/questions/492716/… — PaulMcG
– PaulMcG, Commented Nov 19, 2013 at 2:52

Tim Peters · Accepted Answer · 2013-11-19 02:30:07Z

6

Here's a pretty clean way. You'll have fun figuring out how it works :-)

def expander(s):
    import re
    from itertools import product
    pat = r"\(([^)]*)\)"
    pieces = re.split(pat, s)
    pieces = [piece.split("|") for piece in pieces]
    for p in product(*pieces):
        yield "".join(p)

Then:

for s in ('(A|B|C)_STRING',
          '(|A_)STRING',
          'STRING_(A|B)_STRING_(C|D)'):
    print s, "->"
    for t in expander(s):
        print "   ", t

displays:

(A|B|C)_STRING ->
    A_STRING
    B_STRING
    C_STRING
(|A_)STRING ->
    STRING
    A_STRING
STRING_(A|B)_STRING_(C|D) ->
    STRING_A_STRING_C
    STRING_A_STRING_D
    STRING_B_STRING_C
    STRING_B_STRING_D

answered Nov 19, 2013 at 2:30

Tim Peters

71.4k14 gold badges133 silver badges140 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1346466 Over a year ago

Try print " ".join(expander('(A|B|C)_STR|ING')) to find the error in the code.

Tim Peters Over a year ago

The code I posted implicitly assumes that parentheses and vertical bars are metacharacters, used only to express the alternation patterns the OP was interested in. That leads to the simple code shown. If you want to make other assumptions, that's fine, but then you should spell them out in a new answer of your own. To me, they would complicate the code in ways that merely obscure the real points.

Seçkin Savaşçı · Accepted Answer · 2013-11-19 02:27:36Z

4

import exrex
trend  = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'

>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']

>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']

answered Nov 19, 2013 at 2:27

Seçkin Savaşçı

3,4512 gold badges28 silver badges40 bronze badges

1 Comment

Mark Kennedy Over a year ago

Thanks for taking the time to write this out. This requires an external module

roippi · Accepted Answer · 2013-11-19 02:07:19Z

2

I would do this to extract the groups:

def extract_groups(trend):
    l_parens = [i for i,c in enumerate(trend) if c == '(']
    r_parens = [i for i,c in enumerate(trend) if c == ')']
    assert len(l_parens) == len(r_parens)
    return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]

And then you can evaluate the product of those extracted groups using itertools.product:

expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]

Now it's just a question of splicing those back onto your original expression. I'll use re for that :)

#python3.3+
def _gen(it):
    yield from it

p = re.compile('\(.*?\)')

for tup in product(*extract_groups(trend)):
    gen = _gen(tup)
    print(p.sub(lambda x: next(gen),trend))

STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D

There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.

answered Nov 19, 2013 at 2:07

roippi

26k4 gold badges52 silver badges75 bronze badges

1 Comment

Mark Kennedy Over a year ago

Thanks for taking the time to write this out

Ryszard Czech · Accepted Answer · 2021-05-08 20:33:51Z

2

It is easy to achieve with sre_yield module:

>>> import sre_yield
>>> trend  = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']

The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.

answered May 8, 2021 at 20:33

Ryszard Czech

18.7k4 gold badges27 silver badges39 bronze badges

Collectives™ on Stack Overflow

Python regex string expansion

4 Answers 4

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related