I'm interested how can be implemented recursive regexp matching in Python (I've not found any examples :( ). For example how would one write expression which matches "bracket balanced" string like "foo(bar(bar(foo)))(foo1)bar1"
-
1@Tempus, indeed, or Tony will come. stackoverflow.com/a/1732454/193892Prof. Falken– Prof. Falken2012-08-24 13:42:42 +00:00Commented Aug 24, 2012 at 13:42
-
1@AmigableClarkKant it's ok, I know Tony. He's a friend.Geo– Geo2012-08-24 18:38:04 +00:00Commented Aug 24, 2012 at 18:38
Add a comment
|
3 Answers
You could use pyparsing
#!/usr/bin/env python
from pyparsing import nestedExpr
import sys
astring=sys.argv[1]
if not astring.startswith('('):
astring='('+astring+')'
expr = nestedExpr('(', ')')
result=expr.parseString(astring).asList()[0]
print(result)
Running it yields:
% test.py "foo(bar(bar(foo)))(foo1)bar1"
['foo', ['bar', ['bar', ['foo']]], ['foo1'], 'bar1']
Comments
With PyPi regex, that you can easily install using pip install regex, you may use
import regex
pattern = r'[^()]*+(\((?>[^()]|(?1))*+\)[^()]*+)++'
text = 'foo(bar(bar(foo)))(foo1)bar1'
print(bool(regex.fullmatch(pattern, text)))
# => True
See the Python demo, see the regex pattern demo (note the \A and \z anchors are added in the demo because regex.fullmatch requires a full string match).
Pattern details
\A- implicit inregex.fullmatch- start of string[^()]*+- 0 or more chars other than(and)(possessive match, no backtracking into the pattern allowed)(\((?>[^()]|(?1))*+\)[^()]*+)++- 1+ occurrences of Group 1 pattern:\(- a(char(?>[^()]|(?1))*+- 1+ repetitions (possessive match) of[^()]- any char but(and)|- or(?1)- a regex subroutine that recurses Group 1 pattern
\)- a)char[^()]*+- 0 or more chars other than(and)(possessive match)
\z- implicit inregex.fullmatch- end of string.
See the pattern and more information on regex subroutines at regular-expressions.info.