1

This is very simple regex yet, it runs for over 30 seconds on a very short string: (i7 3970k @ 3.4ghz)

Pattern compile = Pattern.compile("^(?=[a-z0-9-]{1,63})([a-z0-9]+[-]{0,1}){1,63}[a-z0-9]{1}$");
Matcher matcher = compile.matcher("test-metareg-rw40lntknahvpseba32cßáàâåäæç.nl");
boolean matches = matcher.matches(); //Takes 30+ seconds

First part the (?=) is assertion that the string contains at max these characters

The 2nd part is assertion that the string doesn't exceed syntax for example on this case to prevent --'s and end at least in [a-z0-9]

5
  • By looking at your regexp, probably you want/need to parse the String by hand instead. Commented Nov 21, 2013 at 14:52
  • Excessive acktracking. Your regular expression is overly complex. Parse manually as @LuiggiMendoza said. Commented Nov 21, 2013 at 14:53
  • First part the (?=) is assertion that the string contains at max these characters The 2nd part is assertion that the string doesn't exceed syntax for example on this case to prevent --'s and end at least in [a-z0-9] Commented Nov 21, 2013 at 14:54
  • Backtrack limit was exhausted. Please review your expression. Commented Nov 21, 2013 at 14:59
  • Related: en.wikipedia.org/wiki/ReDoS Commented Nov 21, 2013 at 15:05

4 Answers 4

1

I tried to guess your intention but it was not easy:

(?=[a-z0-9-]{1,63}) this look-ahead seem to intend to require the next up to 63 characters to be lowercase ASCII letters or numbers, but in fact, it will succeed even if there’s only one letter followed by anything. So maybe you meant (?=[a-z0-9-]{1,63}$) to forbid anything else after the legal up to 63 characters.

You seem to want groups of at least one letter or number between the - but you made the - optional not really creating a constraint and allowing way to much possibilities which created the overhead of your expression. You can simply say: ([a-z0-9]++-){0,63}[a-z0-9]+. The groups within the braces require at least one letter or number and require the minus after that, the expression at the end requires at least one letter or number at the end of the expression but will also match the last group without a following - at the same time. This last group might also be the only one if no - is contained in your text at all.

Putting it all together you regex becomes: (?=[a-z0-9-]{1,63}$)([a-z0-9]++-){0,63}[a-z0-9]+. Note that you don’t need a leading ^ or trailing $ if you use the matches method; it already implies that the string bounds must match the expression bounds.

I hope I got your intention right…

Sign up to request clarification or add additional context in comments.

7 Comments

Testing against a few strings now :)
UPDATED pastebin.com/y1fkaZ47 is what i tested against. Seems fine now. I bow to thee :)
Note that adding a plus after the {0,63} ( → {0,63}+) will make it even more efficient but I think we are already at a point where it is not noticeable.
There is no reason for the range parameter here ([a-z0-9]++-){0,63} since it only enforces max 63 of - which is not possible. Also, ([a-z0-9]++-){0,63}[a-z0-9]+ can be more efficient by an unrolling [a-z0-9]+(?:-[a-z0-9]+)* expression.
@sln: you have rewritten (ab)*a to a(ba)* which does not change anything. But since you removed the possessive quantifiers your claim that is was more efficient is false.
|
0

I have fixed this regex replacing it as follows:

^(?=[a-z0-9-]{1,63})([a-z0-9]{0,1}|[-]{0,1}){1,63}[a-z0-9]{1}$

The section ([a-z0-9]+[-]{0,1}){1,63} became: ([a-z0-9]{0,1}|[-]{0,1}){1,63}

3 Comments

Why are you writing {0,1} instead of ? all the time? Are you aware that [a-z0-9]{0,1}|[-]{0,1} is the same as [a-z0-9-]?
[a-z0-9]{0,1}|[-]{0,1} is the same as [a-z0-9-]?. This will not prevent --.
Yea i am aware, don't know exactly why i did it :)
0
  • If you want to make sure that there is no -- in your string just use negative look ahead (?!.*--).
  • Also there is no point in writing {1}.
  • Another thing is if you want to ensure that string has max 63 characters then in your look-ahead you need to add $ at the end (?=[a-z0-9-]{1,63}$).

So maybe ^(?=[a-z0-9-]{1,63}$)(?!.*--)[a-z0-9-]+[a-z0-9]$

Comments

0

I think from what you say, your regex can be simplified to this
Edit - (For posterity) After reading @Holger's post, I am changing this to fix possible catastrophic backtracking, and to speed it up, which as my benches show is possibly the fastest way to do it.

 #  ^(?=[a-z0-9-]{1,63}$)[a-z0-9]++(?:-[a-z0-9]+)*+$

 ^                                    # BOL
 (?= [a-z0-9-]{1,63} $ )              # max 1 - 63 of these characters
 [a-z0-9]++ (?: - [a-z0-9]+ )*+       # consume the characters in this order
 $                                    # EOL

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.