Python regex capturing groups not working for simple expression

Question

I would like to get 2 captured groups for a pair of consecutive words. I use this regular expression:

r'\b(hello)\b(world)\b'

However, searching "hello world" with this regular expression yields no results:

regex = re.compile(r'\b(hello)\b(world)\b')
m =  regex.match('hello world') # m evaluates to None.

This is because you're looking for the string <boundary>hello<boundary>world<boundary>, but you're trying to match on the string <boundary>hello<boundary><space><boundary>world<boundary>. — Joel Cornett
– Joel Cornett, Commented Apr 3, 2015 at 0:05
For your use case, I would recommend using re.findall or re.finditer instead. — Joel Cornett
– Joel Cornett, Commented Apr 3, 2015 at 0:06
Makes sense, word boundaries shouldn't be equated with spaces. — minch
– minch, Commented Apr 3, 2015 at 0:07

John1024 · Accepted Answer · 2015-04-03 00:06:02Z

3

You need to allow for space between the words:

>>> import re
>>> regex = re.compile(r'\b(hello)\s*\b(world)\b')
>>> regex.match('hello world')
<_sre.SRE_Match object at 0x7f6fcc249140>
>>>

Discussion

The regex \b(hello)\b(world)\b requires that the word hello end exactly where the word world begins but with a word break \b between them. That cannot happen. Adding space, \s, between them fixes this.

If you meant to allow punctuation or other separators between hello and world, then that possibility should be added to the regex.

answered Apr 3, 2015 at 0:06

John1024

115k15 gold badges152 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python regex capturing groups not working for simple expression

1 Answer 1

Discussion

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Discussion

Comments

Your Answer

Sign up or log in

Post as a guest

Related