7

Suppose I have the following string:

trend  = '(A|B|C)_STRING'

I want to expand this to:

A_STRING
B_STRING
C_STRING

The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)

would expand to

STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D

I also want to cover the case of an empty conditional:

(|A_)STRING would expand to:

A_STRING
STRING

Here's what I've tried so far:

def expandOr(trend):
    parenBegin = trend.index('(') + 1
    parenEnd = trend.index(')')
    orExpression = trend[parenBegin:parenEnd]
    originalTrend = trend[0:parenBegin - 1]
    expandedOrList = []

    for oe in orExpression.split("|"):
        expandedOrList.append(originalTrend + oe)

But this is obviously not working.

Is there any easy way to do this using regex?

5
  • 1
    You realize you're discarding everything after the closing parenthesis, right? Do you not see a way to fix that? Commented Nov 19, 2013 at 1:17
  • Not sure what you mean. The code works for the case where the parentheses come at the end the of the string. i.e. STRING_(A|B) Commented Nov 19, 2013 at 1:33
  • Right, the code works there because there's nothing after the parentheses to discard, but if you input FOO_(A|B)_BAR, you get FOO_A and FOO_B, with the _BAR being discarded. Do you not realize that this is what's wrong with your code? Do you not see how you forgot to handle the substring after the )? Commented Nov 19, 2013 at 1:38
  • More answers to this question here: stackoverflow.com/questions/492716/… Commented Nov 19, 2013 at 2:52
  • @jwodder Yes, I saw the fix. Thanks Commented Nov 19, 2013 at 17:43

4 Answers 4

6

Here's a pretty clean way. You'll have fun figuring out how it works :-)

def expander(s):
    import re
    from itertools import product
    pat = r"\(([^)]*)\)"
    pieces = re.split(pat, s)
    pieces = [piece.split("|") for piece in pieces]
    for p in product(*pieces):
        yield "".join(p)

Then:

for s in ('(A|B|C)_STRING',
          '(|A_)STRING',
          'STRING_(A|B)_STRING_(C|D)'):
    print s, "->"
    for t in expander(s):
        print "   ", t

displays:

(A|B|C)_STRING ->
    A_STRING
    B_STRING
    C_STRING
(|A_)STRING ->
    STRING
    A_STRING
STRING_(A|B)_STRING_(C|D) ->
    STRING_A_STRING_C
    STRING_A_STRING_D
    STRING_B_STRING_C
    STRING_B_STRING_D
Sign up to request clarification or add additional context in comments.

2 Comments

Try print " ".join(expander('(A|B|C)_STR|ING')) to find the error in the code.
The code I posted implicitly assumes that parentheses and vertical bars are metacharacters, used only to express the alternation patterns the OP was interested in. That leads to the simple code shown. If you want to make other assumptions, that's fine, but then you should spell them out in a new answer of your own. To me, they would complicate the code in ways that merely obscure the real points.
4
import exrex
trend  = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'

>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']

>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']

1 Comment

Thanks for taking the time to write this out. This requires an external module
2

I would do this to extract the groups:

def extract_groups(trend):
    l_parens = [i for i,c in enumerate(trend) if c == '(']
    r_parens = [i for i,c in enumerate(trend) if c == ')']
    assert len(l_parens) == len(r_parens)
    return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]

And then you can evaluate the product of those extracted groups using itertools.product:

expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]

Now it's just a question of splicing those back onto your original expression. I'll use re for that :)

#python3.3+
def _gen(it):
    yield from it

p = re.compile('\(.*?\)')

for tup in product(*extract_groups(trend)):
    gen = _gen(tup)
    print(p.sub(lambda x: next(gen),trend))

STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D

There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.

1 Comment

Thanks for taking the time to write this out
2

It is easy to achieve with sre_yield module:

>>> import sre_yield
>>> trend  = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']

The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.