1

I need help with this matter. Look at the following regex:

Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s1);

I want to look for words like this: "home-made", "aaaa-bbb" and not "aaa - bbb", but not "aaa--aa--aaa". Basically, I want the following:

word - hyphen - word.

It is working for everything, except this pattern will pass: "aaa--aaa--aaa" and shouldn't. What regex will work for this pattern?

1

2 Answers 2

5

Can can remove the backslash from your expression:

"[A-Za-z]+-[A-Za-z]+"

The following code should work then

Pattern pattern = Pattern.compile("[A-Za-z]+-[A-Za-z]+");
Matcher matcher = pattern.matcher("aaa-bbb");
match = matcher.matches();

Note that you can use Matcher.matches() instead of Matcher.find() in order to check the complete string for a match.

If instead you want to look inside a string using Matcher.find() you can use the expression

"(^|\\s)[A-Za-z]+-[A-Za-z]+(\\s|$)"

but note that then only words separated by whitespace will be found (i.e. no words like aaa-bbb.). To capture also this case you can then use lookbehinds and lookaheads:

"(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])"

which will read

(?<![A-Za-z-])        // before the match there must not be and A-Z or -
[A-Za-z]+             // the match itself consists of one or more A-Z
-                     // followed by a -
[A-Za-z]+             // followed by one or more A-Z
(?![A-Za-z-])         // but afterwards not by any A-Z or -

An example:

Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher("It is home-made.");
if (matcher.find()) {
    System.out.println(matcher.group());    // => home-made
}
Sign up to request clarification or add additional context in comments.

8 Comments

hm ok thank you. If possible, tell me what the backslash did. I'll test now
it is working, but there is a problem. This shouldnt be considered: aaa-bbb-ccc. And it is actually getting me bbb-ccc and shouldnt
@user974594 Actually the backslash shouldn't do anything bad in your case. It will work also with your original expression.
@user974594 And also aaa-bbb-ccc fill not match.
@Howard - Using your lookarounds, wouldn't (?<!-)[A-Za-z]+-[A-Za-z]+(?!-) be the same thing?
|
0

Actually I can't reproduce the problem mentioned with your expression, if I use single words in the String. As cleared up with the discussion in the comments though, the String s contains a whole sentence to be first tokenised in words and then matched or not.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {

        private static void match(String s) {
                Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
                Matcher matcher = pattern.matcher(s);
                if (matcher.matches()) {
                        System.out.println("'" + s + "' match");
                } else {
                        System.out.println("'" + s + "' doesn't match");
                }
        }

        /**
        * @param args
        */
        public static void main(String[] args) {
                match(" -home-made");
                match("home-made");
                match("aaaa-bbb");
                match("aaa - bbb");
                match("aaa--aa--aaa");
                match("home--home-home");
        }

}

The output is:

' -home-made' doesn't match
'home-made' match
'aaaa-bbb' match
'aaa - bbb' doesn't match
'aaa--aa--aaa' doesn't match
'home--home-home' doesn't match

4 Comments

@Howard yes. this one pass for me: home--home-home
added to my source and DOESN'T match. maybe you're using find() instead of matches(). Try my source.
Matcher matcher = pattern.matcher(sentence); for(int i=0; matcher.find(); i++){ ...
Now I understand why, I thought the string was representing a single word already.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.