Simple nested expression matching with pyparsing

Question

I wanted to match an expression which is looking like this:

(<some value with spaces and m$1124any crazy signs> (<more values>) <even more>)

I simply want to split those values along the round brackets (). Currently, I could reduce the pyparsing overhead in the s-expression examplewhich is far to extensive and not understandable (IMHO).

I got as far as to use the nestedExpr statement, which reduces it to one line:

import pyparsing as pp
parser = pp.nestedExpr(opener='(', closer=')')
print parser.parseString(example, parseAll=True).asList()

The result also appears to be split at the white spaces, which I do not want:

  skewed_output = [['<some',
  'value',
  'with',
  'spaces',
  'and',
  'm$1124any',
  'crazy',
  'signs>',
  ['<more', 'values>'],
  '<even',
  'more>']]
expected_output = [['<some value with spaces and m$1124any crazy signs>' 
['<more values>'], '<even more>']]
best_output = [['some value with spaces and m$1124any crazy signs' 
['more vlaues'], 'even more']]

Optionally, I'd gladly take any points to where I can read some understandable introduction as how to include a more detailed parser (I'd like to extract the value between the < > brackets and match them (see best_output), but I can always string.strip() them afterwards.

Thanks in advance!

yeputons · Accepted Answer · 2017-02-07 02:15:13Z

7

Pyparsing's nestedExpr accepts content and ignoreExpr arguments which specify what is a "single item" of an s-expr. You can pass QuotedString here. Unfortunately, I did not understand the difference between two parameters from docs well enough, but some experiments showed me that the following code should satisfy your requirements:

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")",
                       content=single_value,
                       ignoreExpr=None)

example = "(<some value with spaces and m$1124any crazy signs> (<more values>) <even more>)"
print(parser.parseString(example, parseAll=True))

Output:

[['some value with spaces and m$1124any crazy signs', ['more values'], 'even more']]

It expects list to start with (, end with ), and contain some optionally-whitespace-separated lists or quoted strings, each quoted string should start with <, end with > and do not contain < inside.

You can play around with content and ignoreExpr parameters more to find out that content=None, ignoreExpr=single_value makes the parse accept both quoted and unquoted strings (and separate unquoted strings with spaces):

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")", ignoreExpr=single_value, content=None)

example = "(<some value with spaces and m$1124any crazy signs> (<more values>) <even m<<ore> foo (foo) <(foo)>)"
print(parser.parseString(example, parseAll=True))

Output:

[['some value with spaces and m$1124any crazy signs', ['more values'], 'even m<<ore', 'foo', ['foo'], '(foo)']]

Some questions left open:

Why does pyparsing ignore whitespaces between consecutive list items?
What is the difference between content and ignoreExpr and when one should use each of them?

answered Feb 7, 2017 at 2:15

yeputons

9,3081 gold badge37 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dennlinger Over a year ago

thanks, that looks like what I had in mind. I'll give it a try later, and accept the answer afterwards, since I can't check it right now.

PaulMcG Over a year ago

In general, pyparsing treats whitespace as ignorable delimiters. Did you find the online help for pyparsing here? pythonhosted.org/pyparsing/pyparsing-module.html#nestedExpr . The docs have been greatly enhanced in the past year, about 1000 lines of inline examples added.

dennlinger Over a year ago

Thanks for the link, I haven't been there. yeputons solution worked for me, though. Thanks a lot!

Collectives™ on Stack Overflow

Simple nested expression matching with pyparsing

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related