2

I have to extract parts of a string, actually splitting it where there are spaces. But because there might also be spaces inside the parts I want to extract, I came upon a regex for them to be ignored, actually when those spaces are between brackets.

Note that I don't fully understand the alternatives in regex, made a lot of tests, and I manage it with one bracket level (first log in the example). Also brackets might not be there, so I came upon the last alternative (|[^\s]+) to get things like tag1 too.

After a lot of (not working) tests, I came upon the second regexp, which consists in the first alternative from the first regexp, modified to recognize the second level of nesting, followed the whole first regexp as a second alternative.

This is working fine (so far as there is not a third nesting level, see the example), but I have a feeling there should be an easier solution, as the pattern seems to be recursive (new nesting level + whole last level regexp).

Is there a way to solve this in a more general way (maybe not infinite nesting level, but let's say 4 or 5 deep?). Maybe with recursive regexp?

var str = "tag1 tag2 func(foo) func2(foo, bar) func1(func2(foo), bar, func2(bar)) func1(func2(foo, func1(foo)), bar)";

console.log( str.match(/([^\s]*\([^()]+\)[^\s]*|[^\s]+)/g) );

console.log( str.match(/([^\s]*\((?:[^()]*\([^()]+\)[^()]*)+\)[^\s]*|(?:[^\s]*\([^()]+\)[^\s]*|[^\s]+))/g) );

5
  • Side note: sadly, JS regex engine doesn't support regex recursion. Commented Feb 28, 2017 at 13:08
  • @sp00m gasp, i guess that answers a crucial point of the question, but maybe someone has a better solution than mine for things like 4 level deep.. or i will probably do a function to build the regex string depending on level, but it will be a long string.. Commented Feb 28, 2017 at 13:13
  • 1
    I was going to answer this to tell you not to try contrived solutions and to just write a really simple "parser" yourself... Anyway it's closed, but have the code since I already wrote the stuff. Commented Feb 28, 2017 at 13:28
  • thanks a lot for these answers, that does help! Commented Feb 28, 2017 at 13:32
  • 1
    for up to 3 levels such as [^)(\s]+\((?:[^)(]+|\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\))*\)|[^)(\s]+ (you can easily add levels by breaking the the relevant part apart like this demo and rejoin for JS regex). Commented Feb 28, 2017 at 17:10

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.