How to handle nested parentheses with regex?

Question

I came up with a regex string that parses the given text into 3 categories:

in parentheses
in brackets
neither.

Like this:

\[.+?\]|\(.+?\)|[\w+ ?]+

My intention is to use the outermost operator only. So, given a(b[c]d)e, the split is going to be:

a || (b[c]d) || e

It works fine given parentheses inside brackets, or brackets inside parentheses, but breaks down when there are brackets inside brackets and parentheses inside parentheses. For example, a[b[c]d]e is split as

a || [b[c] || d || ] || e.

Is there any way to handle this using regex alone, not resorting to using code to count number of open/closed parentheses? Thanks!

Which language are you using for this? Regular expressions can (in theory) not parse nested structures. If you are using .NET or Perl/PCRE, you might be lucky though, because they have some advanced features that can. — Martin Ender
– Martin Ender, Commented Jun 29, 2013 at 20:44
The language of nested parentheses is not regular. Hence 'regular' expressions (in the mathematical sense of the term) are not up to the job. period. — Bwmat
– Bwmat, Commented Jun 29, 2013 at 20:45
Rohit - I am trying to create a parser for a context free grammar (JSGF), for which I need to split the text into the three categories. Then my script goes through them recursively one nested level at a time. M, I am using python. Which I guess puts me in the unfortunate category. — Minas Abovyan
– Minas Abovyan, Commented Jun 29, 2013 at 20:48
@MinasAbovyan just split your string into the matches of something like [[\]()]|[^[\]()]+ (the brackets in question and anything else). then walk the matches, incrementing the relevant depth counters when encountering each bracket type. — Martin Ender
– Martin Ender, Commented Jun 29, 2013 at 20:50

arshajii · Accepted Answer · 2013-06-29 21:51:20Z

12

Standard¹ regular expressions are not sophisticated enough to match nested structures like that. The best way to approach this is probably to traverse the string and keep track of opening / closing bracket pairs.

¹ I said standard, but not all regular expression engines are indeed standard. You might be able to this with Perl, for instance, by using recursive regular expressions. For example:

$str = "[hello [world]] abc [123] [xyz jkl]";

my @matches = $str =~ /[^\[\]\s]+ | \[ (?: (?R) | [^\[\]]+ )+ \] /gx;

foreach (@matches) {
    print "$_\n";
}

[hello [world]]
abc
[123]
[xyz jkl]

EDIT: I see you're using Python; check out pyparsing.

edited Jun 29, 2013 at 21:51

answered Jun 29, 2013 at 20:44

arshajii

130k26 gold badges246 silver badges293 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user3489112 · Accepted Answer · 2014-04-02 12:43:27Z

2

Well, once you abandon the idea that parsing nested expressions should work at unlimited depth, one can use regular expressions just fine by specifying a maximum depth in advance. Here is how:

def nested_matcher (n):
    # poor man's matched paren scanning, gives up after n+1 levels.
    # Matches any string with balanced parens or brackets inside; add
    # the outer parens yourself if needed.  Nongreedy.  Does not
    # distinguish parens and brackets as that would cause the
    # expression to grow exponentially rather than linearly in size.
    return "[^][()]*?(?:[([]"*n+"[^][()]*?"+"[])][^][()]*?)*?"*n

import re

p = re.compile('[^][()]+|[([]' + nested_matcher(10) + '[])]')
print p.findall('a(b[c]d)e')
print p.findall('a[b[c]d]e')
print p.findall('[hello [world]] abc [123] [xyz jkl]')

This will output

['a', '(b[c]d)', 'e']
['a', '[b[c]d]', 'e']
['[hello [world]]', ' abc ', '[123]', ' ', '[xyz jkl]']

answered Apr 2, 2014 at 12:43

user3489112

1011 bronze badge

1 Comment

Jan Kowalski Over a year ago

This search actually deleted all of my input EXCEPT for parentheses themsel\/es

Collectives™ on Stack Overflow

How to handle nested parentheses with regex?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related