5

I wanted to match an expression which is looking like this:

(<some value with spaces and m$1124any crazy signs> (<more values>) <even more>)

I simply want to split those values along the round brackets (). Currently, I could reduce the pyparsing overhead in the s-expression examplewhich is far to extensive and not understandable (IMHO).

I got as far as to use the nestedExpr statement, which reduces it to one line:

import pyparsing as pp
parser = pp.nestedExpr(opener='(', closer=')')
print parser.parseString(example, parseAll=True).asList()

The result also appears to be split at the white spaces, which I do not want:

  skewed_output = [['<some',
  'value',
  'with',
  'spaces',
  'and',
  'm$1124any',
  'crazy',
  'signs>',
  ['<more', 'values>'],
  '<even',
  'more>']]
expected_output = [['<some value with spaces and m$1124any crazy signs>' 
['<more values>'], '<even more>']]
best_output = [['some value with spaces and m$1124any crazy signs' 
['more vlaues'], 'even more']]

Optionally, I'd gladly take any points to where I can read some understandable introduction as how to include a more detailed parser (I'd like to extract the value between the < > brackets and match them (see best_output), but I can always string.strip() them afterwards.

Thanks in advance!

1 Answer 1

7

Pyparsing's nestedExpr accepts content and ignoreExpr arguments which specify what is a "single item" of an s-expr. You can pass QuotedString here. Unfortunately, I did not understand the difference between two parameters from docs well enough, but some experiments showed me that the following code should satisfy your requirements:

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")",
                       content=single_value,
                       ignoreExpr=None)

example = "(<some value with spaces and m$1124any crazy signs> (<more values>) <even more>)"
print(parser.parseString(example, parseAll=True))

Output:

[['some value with spaces and m$1124any crazy signs', ['more values'], 'even more']]

It expects list to start with (, end with ), and contain some optionally-whitespace-separated lists or quoted strings, each quoted string should start with <, end with > and do not contain < inside.

You can play around with content and ignoreExpr parameters more to find out that content=None, ignoreExpr=single_value makes the parse accept both quoted and unquoted strings (and separate unquoted strings with spaces):

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")", ignoreExpr=single_value, content=None)

example = "(<some value with spaces and m$1124any crazy signs> (<more values>) <even m<<ore> foo (foo) <(foo)>)"
print(parser.parseString(example, parseAll=True))

Output:

[['some value with spaces and m$1124any crazy signs', ['more values'], 'even m<<ore', 'foo', ['foo'], '(foo)']]

Some questions left open:

  1. Why does pyparsing ignore whitespaces between consecutive list items?
  2. What is the difference between content and ignoreExpr and when one should use each of them?
Sign up to request clarification or add additional context in comments.

3 Comments

thanks, that looks like what I had in mind. I'll give it a try later, and accept the answer afterwards, since I can't check it right now.
In general, pyparsing treats whitespace as ignorable delimiters. Did you find the online help for pyparsing here? pythonhosted.org/pyparsing/pyparsing-module.html#nestedExpr . The docs have been greatly enhanced in the past year, about 1000 lines of inline examples added.
Thanks for the link, I haven't been there. yeputons solution worked for me, though. Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.