1

I'm trying to implement a search function. The user types a phrase and I want to match any word from the phrase and the phrase itself in an array of strings. The problem is that the phrase is stored in a variable, so the Pattern.compile method won't interpret its special characters.

I'm using the following flags for the compile method:

Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.LITERAL | Pattern.MULTILINE

How could I achieve the desired result?

Thanks in advance.

edit: For example, the phrase:

"Dog cats donuts"

would result in the pattern:

Dogs | cats | donuts | Dogs cats donuts

1
  • Edited to add an example. Commented Aug 1, 2013 at 5:54

2 Answers 2

1
  1. Split the user-specified phrase by \s+ into, say, arr.
  2. Build the following pattern:

    "\\b(?:" + Pattern.quote(arr[0]) + "|" + Pattern.quote(arr[1]) + "|" + Pattern.quote(arr[2]) + ... + "\\b"
  3. Compile without the Pattern.LITERAL option.

In other words, if you want your patterns to match words in a user-specified phrase, you have to use alternation (the pipes) so that any one of those words can be considered a match. However, using the Pattern.LITERAL option makes the alternation operators literal—therefore you have to "literalize" just the words themselves, using the Pattern.quote(...) method. The \\b are word boundaries so that you do not match, say, a word in the user's phrase like "bar" when encountering text like "barrage".


Edit. In response to your edit. If you want to match the longest possible match, e.g. not "Dogs" and "cats" and "donuts" but rather "Dogs cats donuts", you should place the complete phrase in the beginning of the alternation series, e.g.

\\b(Dogs cats donuts|Dogs|cats|donuts)\\b
Sign up to request clarification or add additional context in comments.

1 Comment

missing one ')' in the last part of the regex string.
0

Try this:

String regex = "\\b(" + phrase + "|" + phrase.replaceAll("\\s+", "|") + ")\\b"; 

In action:

String phrase = "Dog cats donuts";
String regex = "\\b(" + phrase + "|" + phrase.replaceAll("\\s+", "|") + ")\\b"; 
System.out.println(regex);

Output:

\b(Dog cats donuts|Dog|cats|donuts)\b

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.