python regular expression grouping

Question

My regular expression goal:

"If the sentence has a '#' in it, group all the stuff to the left of the '#' and group all the stuff to the right of the '#'. If the character doesn't have a '#', then just return the entire sentence as one group"

Examples of the two cases:

A) '120x4#Words' -> ('120x4', 'Words')
B) '[email protected]' -> ('[email protected]')

I made a regular expression that parses case A correctly

(.*)(?:#(.*))

# List the groups found
>>> r.groups()
(u'120x4', u'words')

But of course this won't work for case B -- I need to make "# and everything to the right of it" optional

So I tried to use the '?' "zero or none" operator on that second grouping to indicate it's optional.
(.*)(?:#(.*))?

But it gives me bad results. The first grouping eats up the entire string.

# List the groups found
>>> r.groups()
(u'120x4#words', None)

Guess I'm either misunderstanding the none-or-one '?' operator and how it works on groupings or I am misunderstanding how the first group is acting greedy and grabbing the entire string. I did try to make the first group 'reluctant', but that gave me a total no-match.

(.*?)(?:#(.*))?


# List the groups found
>>> r.groups()
(u'', None)

hjpotter92 · Accepted Answer · 2014-09-07 14:32:02Z

3

Simply use the standard str.split function:

s = '120x4#Words'
x = s.split( '#' )

If you still want a regex solution, use the following pattern:

([^#]+)(?:#(.*))?

answered Sep 7, 2014 at 14:32

hjpotter92

81.1k36 gold badges148 silver badges188 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aprillion Over a year ago

+1 for str.split, though the regex is only equivalent for up 1 occurence of # in the string...

vks · Accepted Answer · 2014-09-07 14:30:04Z

1

(.*?)#(.*)|(.+)

this sjould work.See demo.

http://regex101.com/r/oC3nN4/14

answered Sep 7, 2014 at 14:30

vks

68.1k11 gold badges96 silver badges132 bronze badges

1 Comment

user3556757 Over a year ago

Wow, and that's a great site to test regular expressions too -- thanks a lot

Kasravnd · Accepted Answer · 2014-09-07 14:29:54Z

1

use re.split :

>>> import re
>>> a='120x4#Words'
>>> re.split('#',a)
['120x4', 'Words']
>>> b='[email protected]'
>>> re.split('#',b)
['[email protected]']
>>>

answered Sep 7, 2014 at 14:29

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Comments

Peter Sutton · Accepted Answer · 2014-09-07 14:42:25Z

1

Here's a verbose re solution. But, you're better off using str.split.

import re

REGEX = re.compile(r'''
    \A
    (?P<left>.*?)
    (?:
        [#]
        (?P<right>.*)
    )?
    \Z
''', re.VERBOSE)


def parse(text):
    match = REGEX.match(text)
    if match:
        return tuple(filter(None, match.groups()))

print(parse('120x4#Words'))
print(parse('[email protected]'))

Better solution

def parse(text):
    return text.split('#', maxsplit=1)

print(parse('120x4#Words'))
print(parse('[email protected]'))

answered Sep 7, 2014 at 14:42

Peter Sutton

1,2959 silver badges21 bronze badges

1 Comment

user3556757 Over a year ago

yeah, agree the split is tidier. Just tonight I'm experimenting with regular expressions... the VERBOSE modifier looks like it can save me some debugging headache in the future....

Collectives™ on Stack Overflow

python regular expression grouping

4 Answers 4

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related