1

What is the best way to dynamically create a regular expression by variable number of parameters?

E.g. if my regular expression is of the form:

String REGEX = "\\b(?:word1(?:(\\s+)word2(?:(\\s+)word3)?)?)";  

I would like to dynamically create the regular expression string passing/replacing the wordX and I want to pass a variable number of words e.g. just 2 or perhaps 7.

I.e. to end up with:

REGEX = "\\b(?:cat(?:(\\s+)mouse(?:(\\s+)rain)?)?)";  

in one call, and in another:

REGEX = "\\b(?:cat(?:(\\s+)mouse(?:(\\s+)rain(?:(\\s+)blue(?:(\\s+)?)?)?)?)?)";  

An answer that regular expressions are not suited for these constructs could be accepted provided that it is well backed.

2
  • Your third expression is not of the same form as the first two (groups are not nested the same way - rain)(?:sky). So I don't understand what you're trying to do. Commented Jan 22, 2012 at 12:21
  • Oh.This was a copy-paste.It is the same as the previous but with 2 extra arguments.The expression is supposed to find the terms in that order.If it doesn't match all the terms it tries with less Commented Jan 22, 2012 at 12:27

1 Answer 1

4

You can write a recursive function which will generate regex strings in the form of the first example you gave:

String generateRegex(List<String> words)
{
   if(words.isEmpty()) return "";
   String word = words.remove(0);
   return "\\b(?:" + word + generateInnerRegex(words) + ")";
}

String generateInnerRegex(List<String> words)
{
   if(words.isEmpty()) return "";
   String word = words.remove(0);
   return "(?:(\\s+)" + word + generateInnerRegex(words) + ")?";
}

You will have to test and debug this code yourself, but it should give you the idea. (If you do find bugs, please edit this post for any others who come later.)

Sign up to request clarification or add additional context in comments.

2 Comments

+1.It seems to work.But I also am concerned if I end up doing things like this,perhaps regular expressions were not suitable in the first place
Why not? The code is only 8 lines (not including curly braces). Could you make it more concise by hacking something together using indexOf() and other low-level String methods? If you did, would it be easier to get it working (without bugs)? Would it run faster? I think the answer to all 3 questions is no. Matching text with regexps is fast, probably faster than hand-hacked code would be. One improvement on my code would be to add a couple comments explaining what the code is doing and why.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.