0

My regular expression goal:

"If the sentence has a '#' in it, group all the stuff to the left of the '#' and group all the stuff to the right of the '#'. If the character doesn't have a '#', then just return the entire sentence as one group"

Examples of the two cases:

A) '120x4#Words' -> ('120x4', 'Words')
B) '[email protected]' -> ('[email protected]')

I made a regular expression that parses case A correctly

(.*)(?:#(.*))

# List the groups found
>>> r.groups()
(u'120x4', u'words')

But of course this won't work for case B -- I need to make "# and everything to the right of it" optional

So I tried to use the '?' "zero or none" operator on that second grouping to indicate it's optional.
(.*)(?:#(.*))?

But it gives me bad results. The first grouping eats up the entire string.

# List the groups found
>>> r.groups()
(u'120x4#words', None)

Guess I'm either misunderstanding the none-or-one '?' operator and how it works on groupings or I am misunderstanding how the first group is acting greedy and grabbing the entire string. I did try to make the first group 'reluctant', but that gave me a total no-match.

(.*?)(?:#(.*))?


# List the groups found
>>> r.groups()
(u'', None)

4 Answers 4

3

Simply use the standard str.split function:

s = '120x4#Words'
x = s.split( '#' )

If you still want a regex solution, use the following pattern:

([^#]+)(?:#(.*))?
Sign up to request clarification or add additional context in comments.

1 Comment

+1 for str.split, though the regex is only equivalent for up 1 occurence of # in the string...
1
(.*?)#(.*)|(.+)

this sjould work.See demo.

http://regex101.com/r/oC3nN4/14

1 Comment

Wow, and that's a great site to test regular expressions too -- thanks a lot
1

use re.split :

>>> import re
>>> a='120x4#Words'
>>> re.split('#',a)
['120x4', 'Words']
>>> b='[email protected]'
>>> re.split('#',b)
['[email protected]']
>>> 

Comments

1

Here's a verbose re solution. But, you're better off using str.split.

import re

REGEX = re.compile(r'''
    \A
    (?P<left>.*?)
    (?:
        [#]
        (?P<right>.*)
    )?
    \Z
''', re.VERBOSE)


def parse(text):
    match = REGEX.match(text)
    if match:
        return tuple(filter(None, match.groups()))

print(parse('120x4#Words'))
print(parse('[email protected]'))

Better solution

def parse(text):
    return text.split('#', maxsplit=1)

print(parse('120x4#Words'))
print(parse('[email protected]'))

1 Comment

yeah, agree the split is tidier. Just tonight I'm experimenting with regular expressions... the VERBOSE modifier looks like it can save me some debugging headache in the future....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.