1

I have the use case, where I need to allow for processing an arbitrary array of strings, by an arbitrary regex, which is created either by a regex literal, or through the new RegExp() constructor.

It all works fine, until the global g flag is used with capturing groups.

I read a few answers on SO, and the proposed solution is to use regex.exec(string) in a while loop, e.g. How do you access the matched groups in a JavaScript regular expression?, JavaScript regular expressions and sub-matches

I also talked about it on IRC, and was advised against the implementation all together:

but there's regexes that will segfault your engine, unless you're using spidermonkey.

So here is a corner case, try pasting it into a fiddle or plunker, or even the console, it just breaks:

var regexString = '([^-]*)';
var flags = 'ig';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var match;
  var inter = [];
  while (match = regex.exec(item)) {
    inter.push(match[0]);
  }
});

console.log(result);

I did tried it on regex101.com https://regex101.com/r/xG0cL4/1 It breaks even if I do it without the quantifier, i.e. /([^-])/g https://regex101.com/r/yT7sQ2/1

My question: what is the (correct|safe) way to process arbitrary regexes against arbitrary strings?

7
  • while(match = regex.exec(item)) { looks like an infinite loop Commented Apr 9, 2015 at 23:25
  • Isn't result always []? What is the expected result? Commented Apr 9, 2015 at 23:30
  • @TravisJ It's not an infinite loop. When you use the global flag, each call to regex.exec advances the pointer to the next match, and it returns false when there are no more. Commented Apr 9, 2015 at 23:32
  • Its unclear what you mean by arbitrary regexes. Do you actually need to use a regex as a variable? Commented Apr 9, 2015 at 23:35
  • 1
    I think the problem is that [^-]* matches the empty string before the -, and doesn't advance the pointer because the match is zero-length, so you get an infinite loop. Change to [^-]+ and it should work. But your question isn't about this specific regexp, it's about how to use regexps in general, right? Commented Apr 9, 2015 at 23:39

3 Answers 3

1

It's not working because when '-' is reached, exec() fails to match the '-' character, but does match 0 characters (because of *), so it doesn't skip it, and thus it gets stuck. If you use -|([^-]*), then it will skip the '-' character. You will need to then check the 'match.index' property to see if you've reach the end.

Also, you should be adding match[1] not match[0] if your intent is to save the matched text.

This works:

var regexString = '-|([^-]*)'; // or better yet: '([^-]+)' will work also
var flags = 'ig';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var match;
  var inter = [];      
  while (match = regex.exec(item)) {
    if (match.index >= item.length) break;
    else if (match[1] !== void 0) inter.push(match[1]);
  }
});

console.log(result);

but why not use 'match()' instead?

var regexString = '[^-]+';
var flags = 'gi';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var inter = item.match(regex);
});

console.log(result);
Sign up to request clarification or add additional context in comments.

1 Comment

In general case, you can check the if match[0].length == 0 then increment lastIndex: stackoverflow.com/a/26923233/1400768 In this case, there is no problem with your solution, since any string will match either branch.
0

The match object has an index property that contains the position of the current match. If that stays the same between two calls in the loop, it means you're stuck.

var regexString = '([^-]*)';
var flags = 'ig';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var match;
  var inter = [];
  var lastIndex = -1;
  while (match = regex.exec(item)) {
    if (match.index == lastIndex) {
      break;
    }
    lastIndex = match.index;
    inter.push(match[0]);
  }
  result.push(inter);
});

console.log(result);

Comments

0

Why it's crashing: an infinite loop was created by using a RegExp object with a global flag and exec in a loop with a zero-width match. The loop hits the first '-' but doesn't match that character because of the negated character class. It then reverts back to the zero-length match and so doesn't advance the index value for exec. This means on the next loop it starts back at the same spot and does the exact same thing...infinitely.

That said, it's hard to tell what exactly you want but why not try match? It looks like you only care about the matching string so exec seems to be a bit of overkill.

If the desired output is a one-to-one array of results with the input array:

function foo(regexp, strings) {
  return strings.reduce(function(matches, str) {
    matches.push(str.match(regexp));
    return matches;
  }, []);
}

foo(/([^-]+)/ig, arr);
// outputs: [["some","property","image.png"],["another","prop","video.png"],["y","no","work.bmp"]]

foo(new RegExp('([^-]+)', 'ig'), arr);
// outputs: [["some","property","image.png"],["another","prop","video.png"],["y","no","work.bmp"]]

Even with the zero-width matching it won't go into an infinite loop:

foo(/([^-]*)/ig, arr));
// outputs: [["some","","property","","image.png",""],["another","","prop","","video.png",""],["y","","no","","work.bmp",""]]

If the desired output really is one array of all the matches:

function foo(regexp, strings) {
  return strings.reduce(function(matches, str) {
    return matches.concat(str.match(regexp));
  }, []);
}

foo(/([^-]+)/ig, arr);
// outputs: ["some","property","image.png","another","prop","video.png","y","no","work.bmp"]

foo(new RegExp('([^-]+)', 'ig'), arr);
// outputs:  ["some","property","image.png","another","prop","video.png","y","no","work.bmp"]

foo(/([^-]*)/ig, arr));
// outputs: ["some","","property","","image.png","","another","","prop","","video.png","","y","","no","","work.bmp",""]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.