1

I have a file with huge if statements like this:

if ((Pattern.compile("string1|String2|String3").matcher(text_str).find()) 
    && (Pattern.compile("String4|String5").matcher(text_str).find())
    && (Pattern.compile("String6|String7|String8").matcher(text_str).find())
    && (Pattern.compile("String9|String10").matcher(text_str).find())
    && (Pattern.compile("String11|String12").matcher(text_str).find())
    && (Pattern.compile("String13|String14").matcher(text_str).find())
    && (Pattern.compile("String15|String16").matcher(text_str).find())
    && (Pattern.compile("String17|String18").matcher(text_str).find())
    && (Pattern.compile("String19|String19|String20").matcher(text_str).find())
    ) {
    return true;

}

I basically need to do checks for a strings like (Pseudocode):

String contains? (I have a) AND (cat OR dog OR fish) AND (and it) AND (eats OR drinks OR smells) AND (funny OR a lot OR nothing)

how would I make this more maintainable and efficient with a very big amount of checks?

6
  • If your code works, it may be better to go to codereview.stackexchange.com Commented May 22, 2014 at 13:14
  • Are string1, string2 ... literal strings? Commented May 22, 2014 at 13:18
  • So when you say "String1", "String2", etc, these are just placeholders for the real strings, right? And are the actual strings just plain old strings or do they contain any real regexes (e.g., things like \d, etc)? Commented May 22, 2014 at 13:20
  • You match, OK, but what are you doing with these matches? Commented May 22, 2014 at 13:21
  • The String1 etc. are just plain old words. I want to check if a sentence fullfills certain patterns, like the example in the pseodocode. Commented May 22, 2014 at 13:22

2 Answers 2

2

You can do that with one regex using a series of look-aheads:

return text_str.matches("(?s)^(?=.*(string1|String2|String3))(?=.*(String4|String5))(?=.*(String6|String7|String8))(?=.*(String9|String10))(?=.*(String11|String12))(?=.*(String13|String14))(?=.*(String15|String16))(?=.*(String17|String18))(?=.*(String19|String19|String20))");
Sign up to request clarification or add additional context in comments.

11 Comments

+1 simplest answer, maybe it would be best to define the words on variables for better visibility and maintainability.
Yes I thought about that option but I found it to hard to maintain. What I might do is write a function that generates this regexs from a array of words. Would this expression work for a string that has many lines in it, like a html page?
@Aboca you make it lazy(as opposed to greedy) by adding a ? after it so .*? is lazy. It tries to match the least amount of characters.
@FarhadAliNoo There is no possibility of catastrophic backtracking with this regex. Making the quantifier reluctant (or "lazy" as you put it) makes absolutely no difference (and the reason has nothing to do with back tracking)
@FarhadAliNoo it's a look-ahead. it will stop consuming input as soon as it matches, so in effect it's reluctant - it behaves like there's a ? after the .* anyway
|
1

Well you could have a List<List<String>> which you can compile into List<Pattern>:

for(List<String> terms : listOfTerms) {
    String pattern = StringUtils.join(terms, "|");
    patterns.add(Pattern.compile(pattern));
}

and then check:

for(Pattern p : patterns)
    if(!p.matches(string))
        return false;

return true;

This should make the checking easier. For defining the initial list of terms maybe Arrays would actually work better? Something like this:

String[][] terms = {{"cat", "dog"}, {"a", "b"}...};

Which could be formatted to look nice and could contain comments etc...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.