How to parse this string with pyparsing

Question

I may parse my_str with following regex code:

([\w\s]*)\s(\w+)

but I want to use pyparsing.

How can I do that?

my_str = "aa234"
expected_result = ["aa234", ""]

my_str = "aa234 bbb2b ccc ddd eee"
expected_result = ["aa234 bbb2b ccc ddd", "eee"]


my_str = "aa234 bbb2b ccc ddd eee fff ggg hhh"
expected_result = ["aa234 bbb2b ccc ddd eee fff ggg", "hhh"]

You don't need pyparsing for regular expressions. That would be shooting sparrows with cannons. The re module will do the job. I also fail to see how your first example should yield the expected result. — Hyperboreus
– Hyperboreus, Commented Apr 9, 2014 at 20:51
Is this a pyparsing learning exercise? Sadly, you have chosen to start with one of pyparsing's weaker use cases, one that would make use of backtracking in regular expressions. Pyparsing does not do backtracking. It will do lookahead, but only if you tell it. For a "getting started" problem, pick one that works left-to-right through the input string, and does not involve backtracking to figure out that "'eee' looks just like all the other words I've seen, but because it is at the end, then it is different." — PaulMcG
– PaulMcG, Commented Apr 9, 2014 at 21:14
Yes, this is a pyparsing learning exercise. I have written lots of programs (including a template engine) that saved even months by using regex. This time I need to write a domain specific language. As far as I could look at the examples, pyparsing is a powerful tool. Yes, this little task may be done with simple functions, but I want my code look clear too at the end. — ceremcem
– ceremcem, Commented Apr 9, 2014 at 21:55

PaulMcG · Accepted Answer · 2014-04-10 02:24:21Z

2

Here is your sample parser:

from pyparsing import *

stringWord = Word(alphas, alphanums)

# only want words not at the end of the string for the leading part
leadingWord = stringWord + ~LineEnd()

leadingPart = originalTextFor(stringWord + ZeroOrMore(leadingWord))

# define parser, with named results, similar to named groups in a regex
parser = leadingPart("first") + Optional(stringWord, default='')("second")

Here's how it works in practice:

tests = ["aa234", 
         "aa234 bbb2b ccc ddd eee ",]
for test in tests:
    results = parser.parseString(test)
    print results.dump()
    print results.first
    print results.second

Prints:

['aa234', '']
- first: aa234
- second: 
aa234

['aa234 bbb2b ccc ddd', 'eee']
- first: aa234 bbb2b ccc ddd
- second: eee
aa234 bbb2b ccc ddd
eee

answered Apr 10, 2014 at 2:24

PaulMcG

64.1k16 gold badges98 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to parse this string with pyparsing

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related