1

I need to check a pattern against some text (I have to check if my pattern is inside many texts).

This is my example

String pattern = "^[a-zA-Z ]*toto win(\\W)*[a-zA-Z ]*$";    
if("toto win because of".matches(pattern))
 System.out.println("we have a winner");
else
 System.out.println("we DON'T have a winner");

For my test, the pattern must match but using my regexp does not match. Must match :

" toto win bla bla"

"toto win because of"
"toto win. bla bla"


"here. toto win. bla bla"
"here? toto win. bla bla"

"here %dfddfd . toto win. bla bla"

Must not match:

" -toto win bla bla"
" pretoto win bla bla"

I try to do it using my regexp but it does not work.

Can you point me what I'm doing wrong ?

3
  • Are quotes would present in input string? Commented Jun 12, 2012 at 9:39
  • It can be anything. It's an ordinary text Commented Jun 12, 2012 at 9:44
  • Please don't add signatures and taglines to your posts. Also you've been misspelling "a lot" a lot. There's a space between the "a" and "lot". Commented Jun 12, 2012 at 12:49

5 Answers 5

1

This would work

(?im)^[?.\s%a-z]*?\btoto win\b.+$

Explanation

"(?im)" +         // Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
"^" +             // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"[?.\\s%a-z]" +    // Match a single character present in the list below
                     // One of the characters “?.”
                     // A whitespace character (spaces, tabs, and line breaks)
                     // The character “%”
                     // A character in the range between “a” and “z”
   "*?" +            // Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"\\b" +            // Assert position at a word boundary
"toto\\ win" +     // Match the characters “toto win” literally
"\\b" +            // Assert position at a word boundary
"." +             // Match any single character that is not a line break character
   "+" +             // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"$"               // Assert position at the end of a line (at the end of the string or before a line break character)

UPDATE 1

(?im)^[?~`'!@#$%^&*+.\s%a-z]*? toto win\b.*$

UPDATE 2

(?im)^[^-]*?\btoto win\b.*$

UPDATE 3

(?im)^.*?(?<!-)toto win\b.*$

Explanation

"(?im)" +       // Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
"^" +           // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"." +           // Match any single character that is not a line break character
   "*?" +          // Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"(?<!" +        // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   "-" +           // Match the character “-” literally
")" +
"toto\\ win" +   // Match the characters “toto win” literally
"\\b" +          // Assert position at a word boundary
"." +           // Match any single character that is not a line break character
   "*" +           // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"$"             // Assert position at the end of a line (at the end of the string or before a line break character)

RegEx need to escaped for using within code

Sign up to request clarification or add additional context in comments.

5 Comments

This string does not match: "here ! toto win dfddfd "
Actually there can be any character. Imagine a text pulled of a website. We can have anything. What I cannot have is some text/characters (except "-") glued before like "blatoto win" or "-toto win".
Great. It does what I want. Thanks alot.
Again me. I just noticed a bug in this regexp. If I have "-" before my text, even if the "-" is not glued to my text like "- blabla toto win" is still rejected. How do I put the condition regarding the "-" character to reject only if is glued to my text ? Thanks again.
It works like a charm. Thank you very much. It is possible to explain me this patter because I don't really get it.
1

Just change your code to String pattern = "\\s*toto win[\\w\\s]*";

\W means no-word character, \w means word character (a-zA-Z_0-9).

[\\w\\s]* will match any number of words and spaces after "toto win".

UPDATE

To reflect your new requirements, this expression would work:

"((.*\\s)+|^)toto win[\\w\\s\\p{Punct}]*"

((.*\\s)+|^) matches either anything followed by at least one space OR beginning of line.

[\\w\\s\\p{Punct}]* matches any combination of words, numbers, spaces and punctuations.

Comments

0

The following regex

^[a-zA-Z. ]*toto win[a-zA-Z. ]*$

Will match

 toto win bla bla
toto win because of
toto win. bla bla

And doesn't match

-toto win bla bla"

3 Comments

This seems great but a string like "toto win. bla bla" does not work. Any ideas ?
Updated my answers. In your question you mention "special" characters. I added the point . to what you consider special by adding it to the character class. Do you see it? Just add as needed.
I see it. I just updated my question. Still does not fully work. I don't know how to have neither character before my pattern.
0

you are missing space between win and next word in your pattern

try this: \\stoto\\swin\\s\\w

http://gskinner.com/RegExr/ here you can try your regexes

1 Comment

You mean that I have to have String pattern = "(\\s)*toto win(\\s)*(\\W)*"; ?
0

It woud be easier if you included the actual requirements, not the list of stuff to (not) match. I have a strong suspicion "toto winabc" should not match, but am not sure, as you haven't included such example or explained the requirements. Anyway, this works for all your current examples:

static String[] matchThese = new String[] {
        " toto win bla bla",
        "toto win because of",
        "toto win. bla bla",
        "here. toto win. bla bla",
        "here? toto win. bla bla",
        "here %dfddfd . toto win. bla bla"
};

static String[] dontMatchThese = new String[] {
        " -toto win bla bla",
        " pretoto win bla bla"
};


public static void main(String[] args) {
    // either beginning of a line or whitespace followed by "toto win"
    Pattern p = Pattern.compile("(^|\\s)toto win");

    System.out.println("Should match:");
    for (String s : matchThese) {
        System.out.println(p.matcher(s).find());
    }

    System.out.println("Shouldn't match:");
    for (String s : dontMatchThese) {
        System.out.println(p.matcher(s).find());
    }
}

1 Comment

I gave the examples to show what kind of text should match. The text can be anything, so I cannot use your methode. Thanks anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.