2

What is a word boundary in a Python regex? Can someone please explain this on these examples:

Example 1

>>> x = '456one two three123'
>>> y=re.search(r"\btwo\b",x)
>>> y
<_sre.SRE_Match object at 0x2aaaaab47d30>

Example 2

>>> y=re.search(r"two",x)
>>> y
<_sre.SRE_Match object at 0x2aaaaab47d30>

Example 3

>>> ip="192.168.254.1234"
>>> if re.search(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",ip):
...    print ip
...

Example 4

>>> ip="192.168.254.1234"
>>> if re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}",ip):
...    print ip
192.168.254.1234
4
  • 2
    The documentation has the answer: docs.python.org/library/re.html#regular-expression-syntax Commented Apr 13, 2012 at 8:58
  • 2
    Instead of us explaining how four examples are working, why don't you ask about what you don't understand? For example what output where you expecting and what instead come out? Commented Apr 13, 2012 at 8:58
  • 1
    I want to know why \b is required....If i do not give the examples every one comment that u have not tried,if i give examples some person asks "why don't you ask about what you don't understand?" :) Distributed set of people looking at the posts :) Commented Apr 13, 2012 at 9:08
  • If I put regex \b into Google, I get regular-expressions.info/wordboundaries.html as the first result. Commented Apr 13, 2012 at 9:16

2 Answers 2

15

"word boundary" means exactly what it says: the boundary of a word, i.e. either the beginning or the end.

It does not match any actual character in the input, but it will only match if the current match position is at the beginning or end of the word.

This is important because, unlike if you just matched whitespace, it will also match at the beginning or end of the entire input.

So '\bfoo' will match 'foobar' and 'foo bar' and 'bar foo', but not 'barfoo'.

'foo\b' will match 'foo bar' and 'bar foo' and 'barfoo', but not 'foobar'.

Sign up to request clarification or add additional context in comments.

4 Comments

Please note that in these examples the result of the match will always only contain 'foo' from e.g. 'foo bar' and so on. Just to make this clear.
Yes. Also, "match" is actually imprecise, as you'd have to use re.search to get a positive result for the strings not starting with foo.
What characters are considered for word boundaries? Would foo\b match foo-bar, foo_bar, foo=bar, or foo.bar?
@Stevoisiak I'm not sure that I knew that confidently in 2012, although I certainly could have researched and tested it. That said, your comment drew my attention to the fact that this question is a duplicate. The canonical, which I have now used to close this question as a duplicate, includes answers that explain the matter very well.
-1

Try this:

ip="192.168.254.1234"
res = re.findall("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}",ip)
print(res)

Notice how I correctly escaped the dots. The ip is found because the regex doesn't care what comes after the last 1-3 digits.

Now:

ip="192.168.254.1234"
res = re.findall("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",ip)
print(res)

This will not work, since the last 1-3 digits are NOT ENDING AT A BOUNDARY.

2 Comments

Matching the dot was a edit mistake please dont mind.I have corrected it now
This answer doesn't address the revised question by OP, suggest you delete it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.