1

I would like to get 2 captured groups for a pair of consecutive words. I use this regular expression:

r'\b(hello)\b(world)\b'

However, searching "hello world" with this regular expression yields no results:

regex = re.compile(r'\b(hello)\b(world)\b')
m =  regex.match('hello world') # m evaluates to None.
3
  • 3
    This is because you're looking for the string <boundary>hello<boundary>world<boundary>, but you're trying to match on the string <boundary>hello<boundary><space><boundary>world<boundary>. Commented Apr 3, 2015 at 0:05
  • 1
    For your use case, I would recommend using re.findall or re.finditer instead. Commented Apr 3, 2015 at 0:06
  • Makes sense, word boundaries shouldn't be equated with spaces. Commented Apr 3, 2015 at 0:07

1 Answer 1

3

You need to allow for space between the words:

>>> import re
>>> regex = re.compile(r'\b(hello)\s*\b(world)\b')
>>> regex.match('hello world')
<_sre.SRE_Match object at 0x7f6fcc249140>
>>> 

Discussion

The regex \b(hello)\b(world)\b requires that the word hello end exactly where the word world begins but with a word break \b between them. That cannot happen. Adding space, \s, between them fixes this.

If you meant to allow punctuation or other separators between hello and world, then that possibility should be added to the regex.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.