1

I'm reading in a list of strings from a List<String>. The strings look like this:

blah1
blah2
blah3
blah4

In java, I'd like to build a regex that checks for a pattern like this (myString/|yourString) and concatenate that to each of the strings in the list above while doing a pattern match against the lines of a file.

So I do this (the code below is just snippits):

String pattern = "(myString/|yourString.)"
private String listAsString;  

private void createListAsStrings() {
   StringBuilder sb = new StringBuilder();

   for(String string : stringList) {
      sb.append(string + "|");  # using the pipe hoping it will do an OR in the regex
   }

   listAsString = sb.toString();
}

To build the pattern, I'm trying to do the following:

Pattern p = Pattern.compile(pattern + listAsString);

But when I get to running the matcher it doesn't go through each string in the list of strings from my stringbuilder. And then the last problem is that my last string will contain a |.

Is there a way to match myString/blah1 or yourString.blah1 or myString/blah2 etc.. using a regex against each line in a file?

There is a lot of code, so I just posted what seemed relevant.

2 Answers 2

2

The expression that you are looking to build should be as follows:

myString/(?:\Qblah1\E|\Qblah2\E)

You need to wrap the strings blah1, blah2, etc. in \Q - \E in case the strings contain regex metacharacters. To fix the addition of leading | use a boolean variable that indicates if this is the first iteration through the loop or not:

StringBuilder sb = new StringBuilder();
boolean isFirst = true;
for(String word : stringList) {
    if (!isFirst) {
        sb.append('|');
    } else {
        isFirst = false;
    }
    sb.append("\\Q");
    sb.append(word);
    sb.append("\\E");
}
String regex = "myString/" + "(?:" + sb + ")";
Sign up to request clarification or add additional context in comments.

4 Comments

Should be noted that \Q...\E is Java 6+ only
@fge That's strange: Oracle's docs mention \Q...\E in Javadocs for Pattern of 1.4.2.
@dasblinkenlight can you explain the ?: in the regex?
@nkon (?:regex) makes regex a non-capturing group. You need a group to avoid adding myString/ multiple times. Unless you want to retrieve the value matched by regex, you can use ?: to make the group non-capturing.
0

I think the basic problem is that your pattern (ignoring the trailing | problem) is something like

(myString/|yourString.)blah1|blah2|blah3 

which will match one of these

myString/blah1
yourString.blah1
blah2
blah3

That's how the operator precedence works in regexes. You need an extra set of parentheses around the lines from the file (plus see the other answers about \Q..\E and avoiding the bar at the end of the string).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.