0

My pattern is [a-z][\\*\\+\\-_\\.\\,\\|\\s]?\\b

My Result:

a__
not matched
a_.
pattern matched = a_
a._
pattern matched = a.
a..
pattern matched = a

why my first input is alone not matched??? Thanks in advance.

[ PS: got the same result with [a-z][\\*\\+\\-\\_\\.\\,\\|\\s]?\\b ]

6
  • why all those backslashes? Commented Nov 18, 2014 at 7:21
  • Escapes have escapes. Commented Nov 18, 2014 at 7:21
  • What exactly do you want to match? Can you explain in words? Commented Nov 18, 2014 at 7:23
  • 1
    You don't need to escape meta-characters in word classes (between [])! Commented Nov 18, 2014 at 7:28
  • @isnot2bad is right; no need to escape metacharacters in character classes. Your regex could be rewritten as "[a-z][*+\\-_.,|\\\\s]?\b"; also, not sure about the last term: didn't you mean to match a space character? Commented Nov 18, 2014 at 7:32

2 Answers 2

2

Because unlike the period ., the underscore _ is considered to be a word character; so a_ is one word, but a. is a word with interpunction.

So, a__ matches a, then matches _, then fails to match a word boundary (since the next _ is a part of the same word).

a.. matches a, skips the character range, then matches the word boundary between the word a and the interpunction ..

Sign up to request clarification or add additional context in comments.

2 Comments

hi, Is it prints the WordBoundary also. If so how come the result of my second input is a_ If not how come the result of my 3rd input is a.
Your second one detects a word boundary between word-character _ and non-word-character .. Same with the third example, but in reverse (the third example has two words separated by interpunction, the second has a single word a_ and interpunction).
2

With the regex rewritten in a "proper way", that is:

"[a-z][*+\\-_.,|\\s]?\\b"

Or, in an "unquoted", canonical way:

[a-z][*+\-_.,|\s]?\b

that your first input does not match is expected; a character class will only ever match one character. After it matches the first underscore, it looks for a word boundary, but cannot find one: for the Java regex engine, _ is a character which can be part of a word. Hence the result.

2 Comments

Your rewritten regex messed up the \\s part. The original regex is "[a-z][\\*\\+\\-_\\.\\,\\|\\s]?\\b", in raw form [a-z][\*\+\-_\.\,\|\s]?\b and should be rewritten as [a-z][*+_.,|\s-]?\b, and as string literal "[a-z][*+_.,|\\s-]?\\b"
Sorry, but the post has been edited since then; and it was indeed \\\\s in the original

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.