1

I have a regex pattern created on regex101.com: https://regex101.com/r/cMvHlm/7/codegen?language=java

however, that regex does not seem to work in my Java program (I use spring toolsuite as IDE):

@Test
    public void testRegex() {
        //Pattern referenceCodePattern = Pattern.compile("((\\h|\\:)+)(([\u00DFA-Za-z0-9-_#\\\\\\/])+)(([[:punct:]])?)");
        Pattern pattern = Pattern.compile(""
                + "(?:\\s+|chiffre|job-id|job-nr[.]|job-nr|\\bjob id\\b|job nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job nr.|ziffer|kennziffer|kennz.|referenz code|referenz-code|"
                + "referenzcode|ref[.] nr[.]|ref[.] id|ref id|ref[.]id|ref[.]-nr[.]|ref[.]- nr[.]|"
                + "referenz nummer|referenznummer|referenz nr[.]|stellenreferenz| referenz-nr[.]|referenznr[.]|referenz|referenznummer der stelle|id#|id #|stellenausschreibungen|" 
                + "stellenausschreibungs\\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots id|stellenangebots-id|stellenangebotsid|stellen id|stellen-id|stellenid|stellenreferenz|"
                + "stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\\bst[.]-nr[.]\\b|\\bst[.] nr[.]\\b|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|"
                + "bewerbungskennziffer|projekt id|projekt-id|reference number|reference no[.]|reference code|job code|job id|job vacancy no[.]|job-ad-number|auto req id|job ref|\\bstellenausschreibung nr[.]\\b)"
                + ":?(?:\\w*)(?:\\s*)([A-Z]*\\s*)([!\"#$%&'()*+,\\-.\\/:;<=>?@[\\]^_`{|}~]*\\w*[!\"#$%&'()*+,\\-.\\/:;<=>?@[\\]^_`{|}~]*\\w*[!\"#$%&'()*+,\\-.\\/:;<=>?@[\\]^_`{|}~]*\\w*[!\"#$%&'()*+,\\-.\\/:;<=>?@[\\]^_`{|}~]*)?");

        String line = "Referenznummer: INDUSTRY Kontakt: ZAsdfsdfS Herr Andrafgdh Neue Str. 7 21244 Buchholz +42341 22322 [email protected] Stellenanzeige teilen: Jetzt online bewerben! oder bewerben Sie sich mit\n" +
            "Geben Sie bei Ihrer Bewerbung die Stellenreferenz und die Stellenbezeichnung an! \n" +
            "Stellenreferenz:   21533448-JOtest\n\n" +
            "Stellenausschreibung Nr. PD-666/19";


          // Create a Pattern object
          //Pattern r = Pattern.compile(pattern);
          Matcher m = pattern.matcher(line);
          if (m.find( )) {
             System.out.println("Found value: " + m.group(0) );
             System.out.println("Found value: " + m.group(1) );
             System.out.println("Found value: " + m.group(2) );
          }else {
             System.out.println("NO MATCH");
          }                 
    }

I get the following error:

    java.util.regex.PatternSyntaxException: Unclosed character class near index 1337

    at java.util.regex.Pattern.error(Pattern.java:1957)
    at java.util.regex.Pattern.clazz(Pattern.java:2550)
    at java.util.regex.Pattern.clazz(Pattern.java:2506)
    at java.util.regex.Pattern.clazz(Pattern.java:2506)
    at java.util.regex.Pattern.clazz(Pattern.java:2506)
    at java.util.regex.Pattern.sequence(Pattern.java:2065)
    at java.util.regex.Pattern.expr(Pattern.java:1998)
    at java.util.regex.Pattern.group0(Pattern.java:2907)
    at java.util.regex.Pattern.sequence(Pattern.java:2053)
    at java.util.regex.Pattern.expr(Pattern.java:1998)
    at java.util.regex.Pattern.compile(Pattern.java:1698)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1028)

Is there a way to find out where index 1337 is?

3
  • 2
    good luck for the person reading this code 1/2 year from now - even if it is you Commented May 23, 2019 at 9:35
  • 1
    first tip: you regexp is way to complex to find the error easily in it. Based on the number of unions in it, can you try to split it in more reasonable regexp, tests them one by one (using JUnit for example :)) and then, once you are convinced that all parts are good, it would be time to combine them Commented May 23, 2019 at 9:36
  • You did not escape [ inside a character class. Mind that [ and ] inside a Java regex character class must be escaped. There are other issues here, too: 1) nr[.]\\b, 2) no need to escape /, 3) redundant non-capturing groups (?:\\w*). Commented May 23, 2019 at 9:42

2 Answers 2

1

The main problem with the regex is that both [ and ] must be escaped in a character class in a Java regex as these are used to form character class unions and intersections, are "special" there.

Another issue is the [.]\b patterns won't work as expected because a word boundary after a non-word char will require a word char immediately to the right of the current position. You need a \B there, not \b.

You need to escape / char in a Java regex pattern.

You do not have to repeat the pattern at the end of the regex, you may "repeat" it with a limiting {0,3} quantifier after wrapping the repeated pattern with a non-capturing group, (?:...).

Consider a while block to get all matches. You may use a boolean flag to see if there were any matches or not.

Also, you probably want to use \\s+ alternative as the last one in the first group, it is too generic, but I will leave it at the start for the time being.

Use

Pattern pattern = Pattern.compile(""
                + "(?:\\s+|chiffre|job-id|job-nr[.]|job-nr|\\bjob id\\b|job nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job nr\\.|ziffer|kennziffer|kennz\\.|referenz code|referenz-code|"
                + "referenzcode|ref[.] nr[.]|ref[.] id|ref id|ref[.]id|ref[.]-nr[.]|ref[.]- nr[.]|"
                + "referenz nummer|referenznummer|referenz nr[.]|stellenreferenz| referenz-nr[.]|referenznr[.]|referenz|referenznummer der stelle|id#|id #|stellenausschreibungen|" 
                + "stellenausschreibungs\\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots id|stellenangebots-id|stellenangebotsid|stellen id|stellen-id|stellenid|stellenreferenz|"
                + "stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\\bst[.]-nr[.]\\B|\\bst[.] nr[.]\\B|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|"
                + "bewerbungskennziffer|projekt id|projekt-id|reference number|reference no[.]|reference code|job code|job id|job vacancy no[.]|job-ad-number|auto req id|job ref|\\bstellenausschreibung nr[.]\\B)"
                + ":?\\w*\\s*([A-Z]*\\s*)([!\"#$%&'()*+,\\-./:;<=>?@\\[\\]^_`{|}~]*(?:\\w*[!\"#$%&'()*+,\\-./:;<=>?@\\[\\]^_`{|}~]*){0,3})?");

String line = "Referenznummer: INDUSTRY Kontakt: ZAsdfsdfS Herr Andrafgdh Neue Str. 7 21244 Buchholz +42341 22322 [email protected] Stellenanzeige teilen: Jetzt online bewerben! oder bewerben Sie sich mit\n" +
            "Geben Sie bei Ihrer Bewerbung die Stellenreferenz und die Stellenbezeichnung an! \n" +
            "Stellenreferenz:   21533448-JOtest\n\n" +
            "Stellenausschreibung Nr. PD-666/19";


Matcher m = pattern.matcher(line);
boolean found = false;
while (m.find()) {
     found = true;
     System.out.println("Found value: " + m.group(0) );
     System.out.println("Found value: " + m.group(1) );
     System.out.println("Found value: " + m.group(2) );
     System.out.println(" ----------------------- " );
}
if (!found) {
     System.out.println("NO MATCH");
}                 

See this Java demo.

Sign up to request clarification or add additional context in comments.

Comments

0

In Java, unescaped [ is always considered an open class syntax, never a literal. This is the reason some recommend always escape literal class metachars [ ] which translates across most all engines.

Converting

    [!"#$%&'()*+,\-.\/:;<=>?@[\]^_`{|}~] 
to  [!-/:-@\[\]-`{-~]  

then refactoring the regex.
(Note there may be usability problems with the regex as well.)

Before refactor :

(?:\s+|chiffre|job-id|job-nr[.]|job-nr|\bjob[ ]id\b|job[ ]nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job[ ]nr.|ziffer|kennziffer|kennz.|referenz[ ]code|referenz-code|referenzcode|ref[.][ ]nr[.]|ref[.][ ]id|ref[ ]id|ref[.]id|ref[.]-nr[.]|ref[.]-[ ]nr[.]|referenz[ ]nummer|referenznummer|referenz[ ]nr[.]|stellenreferenz|[ ]referenz-nr[.]|referenznr[.]|referenz|referenznummer[ ]der[ ]stelle|id\#|id[ ]\#|stellenausschreibungen|stellenausschreibungs\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots[ ]id|stellenangebots-id|stellenangebotsid|stellen[ ]id|stellen-id|stellenid|stellenreferenz|stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\bst[.]-nr[.]\b|\bst[.][ ]nr[.]\b|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|bewerbungskennziffer|projekt[ ]id|projekt-id|reference[ ]number|reference[ ]no[.]|reference[ ]code|job[ ]code|job[ ]id|job[ ]vacancy[ ]no[.]|job-ad-number|auto[ ]req[ ]id|job[ ]ref|\bstellenausschreibung[ ]nr[.]\b):?(?:\w*)(?:\s*)([A-Z]*\s*)([!"#$%&'()*+,\-.\/:;<=>?@\[\]^_`{|}~]*(?:\w*[!"#$%&'()*+,\-.\/:;<=>?@\[\]^_`{|}~]*){3})?

After refactor :

(?:\s+|chiffre|job(?:-(?:id|nr[.]?|referenzcode|ad-number)|[ ](?:(?:nr|vacancy[ ]no)[.]|code|id|ref)|n(?:ummer|r[.])|id|code)|\b(?:job[ ]id|st(?:[.][ \-]|ellenausschreibung[ ])nr[.])\b|(?:bewerbungskenn)?ziffer|kenn(?:z(?:iffer|.)|-nr[.]|wort)|ref(?:eren(?:z(?:[ ](?:code|n(?:ummer|r[.]))|-?code|n(?:ummer|r[.]|ummer[ ]der[ ]stelle))?|ce[ ](?:n(?:umber|o[.])|code))|[.](?:[ ](?:nr[.]|id)|id|(?:-[ ]?|st[.])nr[.])|[ ]id)|stellen(?:referenz|a(?:usschreibung(?:en|s(?:\s?|-)?nr[.])?|ngebots[ \-]?id)|[ ]?id|-(?:id|referenz)|numer|key|code)|[ ]referenz-nr[.]|id[ ]?\#|p(?:ositionsnummer|rojekt[ \-]id)|auto[ ]req[ ]id):?\w*\s*[A-Z]*\s*(?:[!-/:-@\[\]-`{-~]*(?:\w*[!-/:-@\[\]-`{-~]*){3})?

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.