1

I have this regex:

var reg = /^(\w+ ?)+&(\w+ ?)+$/;

and this string:

var string = "number number number number number number number&";

When I execute:

reg.test(string);

this causes an unresponsive script message. I think this is due to catastrophic backtracking (http://www.regular-expressions.info/catastrophic.html) but I can't find another way to write my regex. It simply must match two groups of any number of lowercase words separated by a space character, separated by a '&' character.

Some examples:

reg.test("string&number") //returns true
reg.test("string number&number") //returns true
reg.test("string number&string date") //returns true
reg.test("&string") //returns false
reg.test("string number&") //returns false
2
  • 1
    Make the space mandatory? Commented Nov 26, 2014 at 19:31
  • 1
    Yes, that's sort of on the right track. The space being optional is what is causing the ambiguity in the regular expression. The solution is to break the first subexpression into two, one of which requires the space. Commented Nov 26, 2014 at 19:35

1 Answer 1

3

If you structure it like this, it removes the catastrophic backtracking:

var reg = /^(\w+)( \w+)*&(\w+)( \w+)*$/;

The key is changing (\w+ ?)+ to (\w+)( \w+)*, so that there is only one way for the string to be parsed for that regular expression. To see the problem in your original expression, let's take a look at the string "number number" and the expression /(\w+ ?)+/. Your intention is to have the string broken into "number " and "number" by this regular expression, however, /(\w+ ?)/ can also match each letter individually, which makes your expression ambiguous. On the other hand, /(\w+)( \w+)*/ can only match "number number" in one way, as a space is required at the beginning of each iteration of the second subpattern.

Test it out here:

function testRegExp() {
  var rStr = '';
  try {
    var reg = new RegExp($('#r').val());
    rStr += reg.test( $('#t').val() );
  } catch( e ) {
     rStr = 'regex error';
  }
  $('#result').val( rStr );
}


$('input').on( 'input', testRegExp );

testRegExp();
label {
  width: 80px;
  display: inline-block;
}
input#t, input#r {
  width: 50%;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p>
<label>String: </label>
  <input id="t" type="text" value="number number number number number number number&" /></p>
<p><label>RegExp: </label>
  <input id="r" type="text" value="^(\w+)( \w+)*&(\w+)( \w+)*$" /></p>
<p>
  <label>String: </label>
  <input id="result" type="text" disabled="disabled" />
</p>

Edit: Alan Moore and nhahtdh pointed out in the comments that this expression can actually be slightly simplified like this:

var reg = /^(\w+[ &])+(\w+( |$))+$/
Sign up to request clarification or add additional context in comments.

6 Comments

Nice one! Note that you can get rid of the optional space in the last group, too: (\w+(?: |$))+$
What's the purpose of (?: |$) rather than ( |$)? Also, do you really need the $ at the end in this case?
( |&) can be changed to [ &]. Character class typically performs between than alternation in absence of optimization. And you missed $ at the very end (which is present in Alan Moore's comment). The $ is important at the end, since nothing says that the repetition would terminate with matching \w+$.
Replacing ( |&) with [ &] is a good move. Thanks for the suggestion. I'll fix that in my answer
(?:..) is a non-capturing group. You should always use those in preference to the capturing variety unless you really want to capture something. As for the final anchor ($), yes, you still need it; the one I added doesn't anchor the whole match. Without it, you could match a string like number number&number number !@#$%^. It won't consume the junk characters, but their presence should cause the match to fail, and it doesn't.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.