Java regexp match pattern

Question

I need to check a pattern against some text (I have to check if my pattern is inside many texts).

This is my example

String pattern = "^[a-zA-Z ]*toto win(\\W)*[a-zA-Z ]*$";    
if("toto win because of".matches(pattern))
 System.out.println("we have a winner");
else
 System.out.println("we DON'T have a winner");

For my test, the pattern must match but using my regexp does not match. Must match :

" toto win bla bla"

"toto win because of"
"toto win. bla bla"


"here. toto win. bla bla"
"here? toto win. bla bla"

"here %dfddfd . toto win. bla bla"

Must not match:

" -toto win bla bla"
" pretoto win bla bla"

I try to do it using my regexp but it does not work.

Can you point me what I'm doing wrong ?

Please don't add signatures and taglines to your posts. Also you've been misspelling "a lot" a lot. There's a space between the "a" and "lot". — user229044
– user229044 ♦, Commented Jun 12, 2012 at 12:49

Cylian · Accepted Answer · 2012-06-12 14:02:24Z

1

This would work

(?im)^[?.\s%a-z]*?\btoto win\b.+$

Explanation

"(?im)" +         // Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
"^" +             // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"[?.\\s%a-z]" +    // Match a single character present in the list below
                     // One of the characters “?.”
                     // A whitespace character (spaces, tabs, and line breaks)
                     // The character “%”
                     // A character in the range between “a” and “z”
   "*?" +            // Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"\\b" +            // Assert position at a word boundary
"toto\\ win" +     // Match the characters “toto win” literally
"\\b" +            // Assert position at a word boundary
"." +             // Match any single character that is not a line break character
   "+" +             // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"$"               // Assert position at the end of a line (at the end of the string or before a line break character)

UPDATE 1

(?im)^[?~`'!@#$%^&*+.\s%a-z]*? toto win\b.*$

UPDATE 2

(?im)^[^-]*?\btoto win\b.*$

UPDATE 3

(?im)^.*?(?<!-)toto win\b.*$

Explanation

"(?im)" +       // Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
"^" +           // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"." +           // Match any single character that is not a line break character
   "*?" +          // Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"(?<!" +        // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   "-" +           // Match the character “-” literally
")" +
"toto\\ win" +   // Match the characters “toto win” literally
"\\b" +          // Assert position at a word boundary
"." +           // Match any single character that is not a line break character
   "*" +           // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"$"             // Assert position at the end of a line (at the end of the string or before a line break character)

RegEx need to escaped for using within code

edited Jun 12, 2012 at 14:02

answered Jun 12, 2012 at 9:44

Cylian

11.2k4 gold badges46 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

CC. Over a year ago

This string does not match: "here ! toto win dfddfd "

CC. Over a year ago

Actually there can be any character. Imagine a text pulled of a website. We can have anything. What I cannot have is some text/characters (except "-") glued before like "blatoto win" or "-toto win".

CC. Over a year ago

Great. It does what I want. Thanks alot.

CC. Over a year ago

Again me. I just noticed a bug in this regexp. If I have "-" before my text, even if the "-" is not glued to my text like "- blabla toto win" is still rejected. How do I put the condition regarding the "-" character to reject only if is glued to my text ? Thanks again.

CC. Over a year ago

It works like a charm. Thank you very much. It is possible to explain me this patter because I don't really get it.

Keppil · Accepted Answer · 2012-06-12 10:19:34Z

1

Just change your code to String pattern = "\\s*toto win[\\w\\s]*";

\W means no-word character, \w means word character (a-zA-Z_0-9).

[\\w\\s]* will match any number of words and spaces after "toto win".

UPDATE

To reflect your new requirements, this expression would work:

"((.*\\s)+|^)toto win[\\w\\s\\p{Punct}]*"

((.*\\s)+|^) matches either anything followed by at least one space OR beginning of line.

[\\w\\s\\p{Punct}]* matches any combination of words, numbers, spaces and punctuations.

edited Jun 12, 2012 at 10:19

answered Jun 12, 2012 at 9:07

Keppil

46.3k9 gold badges101 silver badges121 bronze badges

Comments

buckley · Accepted Answer · 2012-06-12 09:09:44Z

0

The following regex

^[a-zA-Z. ]*toto win[a-zA-Z. ]*$

Will match

 toto win bla bla
toto win because of
toto win. bla bla

And doesn't match

-toto win bla bla"

edited Jun 12, 2012 at 9:09

answered Jun 12, 2012 at 9:00

buckley

14.2k3 gold badges58 silver badges63 bronze badges

3 Comments

CC. Over a year ago

This seems great but a string like "toto win. bla bla" does not work. Any ideas ?

buckley Over a year ago

Updated my answers. In your question you mention "special" characters. I added the point . to what you consider special by adding it to the character class. Do you see it? Just add as needed.

CC. Over a year ago

I see it. I just updated my question. Still does not fully work. I don't know how to have neither character before my pattern.

dantuch · Accepted Answer · 2012-06-12 09:11:08Z

0

you are missing space between win and next word in your pattern

try this: \\stoto\\swin\\s\\w

http://gskinner.com/RegExr/ here you can try your regexes

edited Jun 12, 2012 at 9:11

answered Jun 12, 2012 at 8:58

dantuch

9,3037 gold badges47 silver badges70 bronze badges

1 Comment

CC. Over a year ago

You mean that I have to have String pattern = "(\\s)*toto win(\\s)*(\\W)*"; ?

pafau k. · Accepted Answer · 2012-06-12 10:56:07Z

0

It woud be easier if you included the actual requirements, not the list of stuff to (not) match. I have a strong suspicion "toto winabc" should not match, but am not sure, as you haven't included such example or explained the requirements. Anyway, this works for all your current examples:

static String[] matchThese = new String[] {
        " toto win bla bla",
        "toto win because of",
        "toto win. bla bla",
        "here. toto win. bla bla",
        "here? toto win. bla bla",
        "here %dfddfd . toto win. bla bla"
};

static String[] dontMatchThese = new String[] {
        " -toto win bla bla",
        " pretoto win bla bla"
};


public static void main(String[] args) {
    // either beginning of a line or whitespace followed by "toto win"
    Pattern p = Pattern.compile("(^|\\s)toto win");

    System.out.println("Should match:");
    for (String s : matchThese) {
        System.out.println(p.matcher(s).find());
    }

    System.out.println("Shouldn't match:");
    for (String s : dontMatchThese) {
        System.out.println(p.matcher(s).find());
    }
}

answered Jun 12, 2012 at 10:56

pafau k.

1,68712 silver badges20 bronze badges

1 Comment

CC. Over a year ago

I gave the examples to show what kind of text should match. The text can be anything, so I cannot use your methode. Thanks anyway.

Collectives™ on Stack Overflow

Java regexp match pattern

5 Answers 5

5 Comments

Comments

3 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related