15

I'd like to use pyparsing to parse an expression of the form: expr = '(gimme [some {nested [lists]}])', and get back a python list of the form: [[['gimme', ['some', ['nested', ['lists']]]]]]. Right now my grammar looks like this:

nestedParens = nestedExpr('(', ')')
nestedBrackets = nestedExpr('[', ']')
nestedCurlies = nestedExpr('{', '}')
enclosed = nestedParens | nestedBrackets | nestedCurlies

Presently, enclosed.searchString(expr) returns a list of the form: [[['gimme', ['some', '{nested', '[lists]}']]]]. This is not what I want because it's not recognizing the square or curly brackets, but I don't know why.

2 Answers 2

28

Here's a pyparsing solution that uses a self-modifying grammar to dynamically match the correct closing brace character.

from pyparsing import *

data = '(gimme [some {nested, nested [lists]}])'

opening = oneOf("( { [")
nonBracePrintables = ''.join(c for c in printables if c not in '(){}[]')
closingFor = dict(zip("({[",")}]"))
closing = Forward()
# initialize closing with an expression
closing << NoMatch()
closingStack = []
def pushClosing(t):
    closingStack.append(closing.expr)
    closing << Literal( closingFor[t[0]] )
def popClosing():
    closing << closingStack.pop()
opening.setParseAction(pushClosing)
closing.setParseAction(popClosing)

matchedNesting = nestedExpr( opening, closing, Word(alphas) | Word(nonBracePrintables) )

print matchedNesting.parseString(data).asList()

prints:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

Updated: I posted the above solution because I had actually written it over a year ago as an experiment. I just took a closer look at your original post, and it made me think of the recursive type definition created by the operatorPrecedence method, and so I redid this solution, using your original approach - much simpler to follow! (might have a left-recursion issue with the right input data though, not thoroughly tested):

from pyparsing import *

enclosed = Forward()
nestedParens = nestedExpr('(', ')', content=enclosed) 
nestedBrackets = nestedExpr('[', ']', content=enclosed) 
nestedCurlies = nestedExpr('{', '}', content=enclosed) 
enclosed << (Word(alphas) | ',' | nestedParens | nestedBrackets | nestedCurlies)


data = '(gimme [some {nested, nested [lists]}])' 

print enclosed.parseString(data).asList()

Gives:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

EDITED: Here is a diagram of the updated parser, using the railroad diagramming support coming in pyparsing 3.0. railroad diagram

Sign up to request clarification or add additional context in comments.

2 Comments

Paul, thank you so much for the informative answer. And thank you even more for creating and open sourcing my new favorite python library! pyparsing is helping me dramatically reduce the size, complexity, and maintainability of a project I've working on.
If anyone is confused by the << operator used in the updated example, see the documentation of the pyparsing Forward class: pythonhosted.org/pyparsing/pyparsing.Forward-class.html
-3

This should do the trick for you. I tested it on your example:

import re
import ast

def parse(s):
    s = re.sub("[\{\(\[]", '[', s)
    s = re.sub("[\}\)\]]", ']', s)
    answer = ''
    for i,char in enumerate(s):
        if char == '[':
            answer += char + "'"
        elif char == '[':
            answer += "'" + char + "'"
        elif char == ']':
            answer += char
        else:
            answer += char
            if s[i+1] in '[]':
                answer += "', "
    ast.literal_eval("s=%s" %answer)
    return s

Comment if you need more

5 Comments

Apologies for not being clear enough, but the output I was referring to is a nested python list, which is a common result of parsing nested expressions with pyparsing. Your solution just returns a string that looks like a printed python list. Thanks for your help though!
@Derek: I'm not returning a string. I'm returning a list. The variable named answer is a string, yes; but that's why there is that line that says exec"s=%s" %answer. This creates a new variable called s, which is a list. This is why my code returns s and not answer. You should check the type of the returned value, and you'll see that it's a list, not a string
you are returning a list, but I think you've misunderstood what parsing is in this context. When you parse a string, you typically have access to the matched tokens/groups at parse time, allowing you to perform some action on them. Your program just dynamically generates python code and execs it to transform a string into a nested list. It doesn't parse anything, nor does it use pyparsing as mentioned in the original question. Not to mention it will exec arbitrary python code, so it would fail on inputs with quotes, for example.
All other criticisms aside, you shouldn't be using exec like that. At most, you should use ast.literal_eval.
Dangerous use of exec -- data could run code to delete files on disk, upload sensitive information, etc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.