4

I'm using a commercial closed-source Java application that, besides everything it does, allows to filter text fields by providing a regex pattern string. I'm using that filter functionality quite extensively.

The issue I'm having is that I often find myself repeating the same exact subpatterns in the regex. For example, here

^(
    ( # pattern foo
        foo_([^_]+)_(windows|linux|osx)
    )
    |
    ( # pattern bar
        ([^_]+)_bar_(windows|linux|osx)_foo_(windows|linux|osx)
    )
)$

The ([^_]+) and (windows|linux|osx) parts repeat quite often.

That's just a made up example. The original regex is more complex, about 20 times larger and has a lot of different repeats. It becomes a bit harder to read since the repeated subpatterns only keep growing in size as well as in number, and it's troublesome that when you try to modify a repeated subpattern, you have to modify all its repeats too.

So, I played with regex101 and came up with this

^(
    ( # a dummy option, defines some frequently used capture groups
        (?!x)x # always false, so nothing matches this and the following groups ever
        (?'name'[^_]+) # group "name"
        (?'os'windows|linux|osx) # group "os"
    )
    |
    ( # pattern foo
        foo_\g'name'_\g'os'
    )
    |
    ( # pattern bar
        \g'name'_bar_\g'os'_foo_\g'os'
    )
)$

regex101 save

Now all of the subpatterns are named and whenever I reference the name, they are replaced with the subpattern string (i.e. \g'os' gets replaced by (windows|linux|osx)). The names are a lot shorter than the corresponding subpattern, they also are clear and you have to modify a subpattern once for the modification to apply everywhere in the regex.

The issue with this improved version is that while it's a valid PHP pcre regex, it's invalid Java regex. Comments and broken lines in the regex aside, Java doesn't support \g, as stated in Comparison to Perl 5.

Is there any way I can "factor out" the repeated regex patterns like that in Java Regex? Don't forget that all I can do is provide a pattern string, I have no access to the code.

9
  • 1
    stackoverflow.com/a/415635/460557 Commented Aug 14, 2015 at 1:55
  • It doesn't answer my question in a slightest. It says that naming groups and using \k is supported, but \g, which is what I need, is still unsupported. Commented Aug 14, 2015 at 2:17
  • 1
    @CookieCat: What you want to do can be achieved by string concatenation in Java. An example: stackoverflow.com/questions/26507391/… (scroll down to bottom) Commented Aug 14, 2015 at 6:23
  • 1
    @nhahtdh that is correct, except that I mentioned in the very beginning of the question that I'm a user of commercial closed-source Java application and restated it in the very end of my question saying that I don't have access to the source code of it. I need everything to be done entirely in Java's Regex. Other flavors of regex, such as Perl's, Python's, JavaScript's, PHP's and many other support the \g escape sequence for referencing named groups, which is what would solve my issue, but Java doesn't support it. And my question was whether what I want is possible to do in Java's Regex. Commented Aug 14, 2015 at 7:47
  • 1
    @nhahtdh I see. I hoped there might be some clever workaround. It was a lot more desirable to keep it regex-only as much as possible, but since there is no way around it, I will have to resort to writing a program that will print to stdout the regex I want, using variables for that substitution I want. Commented Aug 14, 2015 at 7:57

3 Answers 3

0

As of Java 8 a pure regular expression solution doesn't exist. The \g may be supported in newer versions in the future.

As already mentionned, the only solution is the string concatenation technique. However it is not an option in your case.

If you tell us the name of the commercial closed-source Java application, maybe we can help you more.

Sign up to request clarification or add additional context in comments.

Comments

0

If you can run some of your java code before submitting the pattern, you could use StrSubstitutor from apache.commons:

Map<String, String> valuesMap = new HashMap<>();
valuesMap.put("os", "(windows|linux|osx)");
valuesMap.put("name", "(?[^_]+)");
StrSubstitutor sub = new StrSubstitutor(valuesMap);

String template ="^(\n"+
        "    ( # pattern foo\n"+
        "        foo_${name}_${os}\n"+
        "    )\n"+
        "    |\n"+
        "    ( # pattern bar\n"+
        "        ${name}_bar_${os}_foo_${os}\n"+
        "    )\n"+
        ")$";
String regex = sub.replace(template);
System.out.println(regex);

Comments

0

Your regex reduces to ^(?:foo_[^_]+|[^_]+_bar_(?:windows|(?:linu|os)x)_foo)_(?:windows|(?:linu|os)x)$

^ 
(?:
  foo_ [^_]+ 
| [^_]+ _bar_
  (?:
    windows
  | (?: linu | os )
    x
  )
  _foo
)
_
(?:
  windows
| (?: linu | os )
  x
)
$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.