1

I'm trying to string.matchAll the following string:

const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'

I want to match the following:

  • 1st: "textA [aaa](bbb)"
  • 2nd: " textB [ccc](ddd)"
  • 3rd: " textC"

NOTE: The capturing groups are already present in the regex. That's what I need.

It's almost working, but so far I couldn't think of a way to match the last part of the string, which is just " textC", and doesn't have the [*](*) pattern.

What am I doing wrong?

const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /(.*?)\[(.+?)\]\((.+?)\)/g;

const array = Array.from(text.matchAll(regexp));
console.log(JSON.stringify(array[0][0]));
console.log(JSON.stringify(array[1][0]));
console.log(JSON.stringify(array[2][0]));

UPDATE:

Besides the good solutions provided in the answers below, this is also an option:

const text= 'textA [aaa](bbb) textB [ccc](ddd) textC'

const regexp = /(?!$)([^[]*)(?:\[(.*?)\]\((.*?)\))?/gm;

const array = Array.from(text.matchAll(regexp));

console.log(array);

3
  • try this : (\w+)\s*(?:[(.+?)]((.+?)))? Commented Jun 17, 2019 at 18:48
  • Anything wrong with (.+\)) (.+\)) (.+)? Commented Jun 17, 2019 at 18:51
  • My answer will work to split any string with any pattern while keeping the matched text in the left-hand split chunk. Is it working for you? Are you sure the result you need is the one you showed in the question? Is textC a placeholder and it can just be equal to word 1 word 2 and word 3 and so on.... and you need to get this text as a single item in the resulting array? Commented Jun 18, 2019 at 9:03

3 Answers 3

2

It's because there is no third match. After the first two matches, the only thing left in the string is "text C":

https://regex101.com/r/H9Kn0G/1/

to fix this, make the whole second part optional (also note the initial \w instead of . to prevent that dot from eating the whole string, as well as the "grouping only" parens used to surround the optional part, which keeps your match groups the same):

(\w+)(?:\s\[(.+?)\]\((.+?)\))?

https://regex101.com/r/Smo1y1/2/

Sign up to request clarification or add additional context in comments.

9 Comments

The word character is too restrictive for me. I want to match ANY string followed by the pattern [+](+), and if multiple patterns [+](+) are written together one after the other I want to match them 1 by 1.
to match literally anything up to that next bracket, try this: ((?:(?!\[).)+)(?:\s?\[(.+?)\]\((.+?)\))?. regex101.com/r/HqbTpU/1 I added a 'tempered token' with that negative lookahead, more complex obviously.
@ScottWeaver Please never use a tempered greedy token when you restrict a . with a single char. (?:(?!\[).)+ (almost) = [^[]+. It is equal to something like [^[\n\r]+ in fact. The negated character class works much faster.
yes, that's a little simpler and works the same. regex101.com/r/vMFKXH/1
Besides, your regex solution is easy to break if there are "standalone" brackets before [...](...) construction.
|
2

Solution 1: Splitting through matching

You may split by matching the pattern and getting substrings from the previous index up to the end of the match:

const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /\[[^\][]*\]\([^()]*\)/g;
let m, idx = 0, result=[];
while(m=regexp.exec(text)) {
  result.push(text.substring(idx, m.index + m[0].length).trim());
  idx = m.index + m[0].length;
}
if (idx < text.length) {
  result.push(text.substring(idx, text.length).trim())
}
console.log(result);

Note:

  • \[[^\][]*\]\([^()]*\) matches [, any 0+ chars other than [ and ] (with [^\][]*), then ](, then 0+ chars other than ( and ) (with [^()]*) and then a ) (see the regex demo)
  • The capturing groups are removed, but you may restore them and save in the resulting array separately (or in another array) if needed
  • .trim() is added to get rid of the leading/trailing whitespace (remove if not necessary).

Solution 2: Matching optional pattern

The idea is to match any chars before the pattern you have and then match either your pattern or end of string:

let result = text.match(/(?!$)(.*?)(?:\[(.*?)\]\((.*?)\)|$)/g);

If the string can have line breaks, replace . with [\s\S], or consider this pattern:

let result = text.match(/(?!$)([\s\S]*?)(?:\[([^\][]*)\]\(([^()]*)\)|$)/g);

See the regex demo.

JS demo:

const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /(?!$)(.*?)(?:\[(.*?)\]\((.*?)\)|$)/g;

const array = Array.from(text.matchAll(regexp));
console.log(JSON.stringify(array[0][0]));
console.log(JSON.stringify(array[1][0]));
console.log(JSON.stringify(array[2][0]));

Regex details

  • (?!$) - not at the end of string
  • (.*?) - Group 1: any 0+ chars other than line break chars as few as possible (change to [\s\S]*? if there can be line breaks or add s modifier since you target ECMAScript 2018)
  • (?:\[(.*?)\]\((.*?)\)|$) - either of the two alternatives:
    • \[(.*?)\]\((.*?)\) - [, Group 2: any 0+ chars other than line break chars as few as possible, ](, Group 3: any 0+ chars other than line break chars as few as possible, and a )
    • | - or
    • $ - end of string.

8 Comments

Sorry I took too long to give a feedback. Your answer seems too work (it's missing some spaces from textB and textC), but my main problem with it is that it seemed not so readable to me. I would like it better to work with a regex and the matchAll method. Thank you.
I've got this regex which is basically working, but it's matching a zero-length match on the last position. /([^\[]*)?(?:\[(.+?)\]\((.+?)\))?/gm
@cbdev420 So, you want to use an unreadable regex solution? :) /(?=[\s\S])([\s\S]*?)(?:\[([^\][]*)\]\(([^()]*)\)|$)/g
I think that the regex I got now which is almost working is pretty readable. ([^\[]*)?(?:\[(.+?)\]\((.+?)\))? Basically a group to match any character but the left bracket [ if possible, then I try to match the pattern [+](+). But I agree that readability is a point of view. It's really personal.
@cbdev420 (?=.) or (?!$) are synonymic. See this answer of mine.
|
0

That is what I've ended up using:

const text= 'textA [aaa](bbb) textB [ccc](ddd) textC'

const regexp = /(?!$)([^[]*)(?:\[(.*?)\]\((.*?)\))?/gm;

const array = Array.from(text.matchAll(regexp));

console.log(array);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.