1

I am facing a bit problem in Regex. I want to find the start and end index of the complete matched string in the input string.

e.g. I have an array of strings like

["a", "aa"]

and I have a text like I like a problem aa

I am doing with iteration of array strings.

let arr = ["a", "aa"];
let str = "I like a problem aa";
let indicesArr = [];
arr.forEach(a=>{
  const regexObj = new RegExp(a, "gi");
  let match;
  while ((match = regexObj.exec(str))) {
    let obj = { start: match.index, end: regexObj.lastIndex }
    indicesArr.push(obj);
    if(!match.index || !regexObj.lastIndex) break;
  }
});

above code gives me the result

[
  {start: 7, end: 8},
  {start: 17, end: 18},
  {start: 18, end: 19},
  {start: 17, end: 19}
]

I want the result should be

[
  {start: 7, end: 8},
  {start: 17, end: 19}
]

Any suggestion would be very helpful, thanks:)

7
  • Use word boundaries to match a whole word, not a partial match. Commented Aug 14, 2021 at 21:12
  • Hey, thank you so much for the suggestion,but if my string is I like a problem aa and aaaa, actually it's not matching the last aaaa part. Commented Aug 14, 2021 at 21:16
  • Then you need to use a aa|a regex. Commented Aug 14, 2021 at 21:23
  • I am not getting, can you please write a small example, it would be helpful thanks:) Commented Aug 14, 2021 at 21:24
  • See jsfiddle.net/wiktor_stribizew/jomL13rq Commented Aug 14, 2021 at 21:26

1 Answer 1

2

The problem here is that a finds two matches in aa. You need to make sure you match all occurrences of a regex that finds either aa or a in this order. It means, the regex must be /aa|a/g and not /a|aa/g as the order of alternation matters in regex.

Here, you can use

let arr = ["a", "aa"];
let str = "I like a problem aa";
let indicesArr = [];
arr.sort((a, b) => b.length - a.length);
const regexObj = new RegExp(arr.map(x=> x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|'), "gi");
let match;
while (match = regexObj.exec(str)) {
    let obj = { start: match.index, end: regexObj.lastIndex }
    indicesArr.push(obj);
}
console.log(indicesArr);

Note these two lines:

  • arr.sort((a, b) => b.length - a.length); - sorts the arr items by length in the descending order (to put aa before a)
  • const regexObj = new RegExp(arr.map(x=> x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|'), "gi"); - escapes all items in the arr array for use inside a regex, and joins the items with | alternation operator into a single string regex pattern.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.