1

In Java, I was unable to get a regex to behave the way I wanted, and wrote this little JUnit test to demonstrate the problem:

public void testLookahead() throws Exception {
    Pattern p = Pattern.compile("ABC(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());

    p = Pattern.compile("[A-Z]{3}(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());

    p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find()); //fails, why?

    p = Pattern.compile("[A-Za-z]{3}(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());  //fails, why?
}

Every line passes except for the two marked with the comment. The groupings are identical except for pattern string. Why would adding case-insensitivity break the matcher?

2 Answers 2

1

Your tests fail, because in both cases, the pattern [A-Z]{3}(?!!) (with CASE_INSENSITIVE) and [A-Za-z]{3}(?!!) find at least one match in "blah/ABC!/blah" (they find bla twice).

A simple tests shows this:

Pattern p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("blah/ABC!/blah");
while(m.find()) {
    System.out.println(m.group());
}

prints:

bla
bla
Sign up to request clarification or add additional context in comments.

1 Comment

DUH! All answers are correct, and I'm feeling sheepish for not seeing it myself. This answer gets the win for also telling me how I should have found it on my own...
1

Those two don't throw false values because there are substrings within the full string that match the pattern. Specifically, the string blah matches the regular expression (three letters not followed by an exclamation mark). The case-sensitive ones correctly fail because blah isn't upper-case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.