1

I have the following regular expression, that I am compiling with Pattern class.

\bIntegrated\s+Health\s+System\s+\(IHS\)\b

Why is this not matching this string?

"test pattern case Integrated Health System (IHS)."

If I try \bpattern\b, it seems to work, but for the above phrase it does not. I have the parenthesis in the pattern escaped, so not sure why it doesn't work. It does match if I remove the parenthesis portion of the pattern, but I want to match the whole thing.

4
  • I did escape it, stackoverflow un-escaped :). My expression reads like this: <code>\bIntegrated\s+Health\s+System\s+\(IHS\)\b</code> Commented Jan 5, 2010 at 23:30
  • You should edit your question rather than adding a comment. Commented Jan 5, 2010 at 23:31
  • SO doesn't know <code> tags. Just indent with 4 spaces or select it and press 010101 button or Ctrl+K. Also see the Markdown FAQ on the right hand of the message editor. Commented Jan 5, 2010 at 23:33
  • Got it (indenting 4 spaces for code)! Thanks! Commented Jan 5, 2010 at 23:37

3 Answers 3

1

1) escape the parens, otherwise they are capturing and group metacharacters, not literal parenthesis \( \)

2) remove the final \b you can't use a word boundary after a literal ), since ) is not considered part of a word.

\bIntegrated\s+Health\s+System\s+\(IHS\)\W
Sign up to request clarification or add additional context in comments.

6 Comments

Okay, how do I indicate the trailing boundary then, so it does not match something like \bIntegrated\s+Health\s+System\s+\(IHS\)testing I need to make sure it only matches the whole phrase and not some string that starts with this phrase.
you could use \W which is the same as [^\w] or [^a-bA-B0-9_] (not sure exactly what it includes in java), or you could create you own character class (or negated class) to specify what does or does not indicate a match. I've updated the example with \W which will likely work pretty well.
Thanks, \W seems to work pretty well so far combined with grouping to extract the matched phrase minus the non-word character that follows.
If you want to allow the match at the end of the string you would have to say ($|\W). I'm not sure it's so important though, are you likely to have strings like Integrated Health Systems (IHS)foo? The close bracket is almost invariably followed by space or punctuation.
Okay, here is my final regex pattern: "(\\b|\\W)(" + phrase + ")($|\\W)" Using the group 2 to get the matched phrase.
|
0

You've got (IHS) - a group - where you want \(IHS\) as the literal brackets.

Comments

0

You need to escape the parentheses

\bIntegrated\s+Health\s+System\s+\(IHS\)\b

Parentheses delimit a capture group. To match a literal set of parentheses, you can escape them like this \( \)

1 Comment

It isn’t safe to use \b in Java. It doesn’t mean what you think it does. See here for why.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.