2

I may parse my_str with following regex code:

([\w\s]*)\s(\w+)

but I want to use pyparsing.

How can I do that?

my_str = "aa234"
expected_result = ["aa234", ""]

my_str = "aa234 bbb2b ccc ddd eee"
expected_result = ["aa234 bbb2b ccc ddd", "eee"]


my_str = "aa234 bbb2b ccc ddd eee fff ggg hhh"
expected_result = ["aa234 bbb2b ccc ddd eee fff ggg", "hhh"]
4
  • 1
    You don't need pyparsing for regular expressions. That would be shooting sparrows with cannons. The re module will do the job. I also fail to see how your first example should yield the expected result. Commented Apr 9, 2014 at 20:51
  • 3
    actually, this looks a lot like str.rsplit(None, 1) Commented Apr 9, 2014 at 20:59
  • Is this a pyparsing learning exercise? Sadly, you have chosen to start with one of pyparsing's weaker use cases, one that would make use of backtracking in regular expressions. Pyparsing does not do backtracking. It will do lookahead, but only if you tell it. For a "getting started" problem, pick one that works left-to-right through the input string, and does not involve backtracking to figure out that "'eee' looks just like all the other words I've seen, but because it is at the end, then it is different." Commented Apr 9, 2014 at 21:14
  • Yes, this is a pyparsing learning exercise. I have written lots of programs (including a template engine) that saved even months by using regex. This time I need to write a domain specific language. As far as I could look at the examples, pyparsing is a powerful tool. Yes, this little task may be done with simple functions, but I want my code look clear too at the end. Commented Apr 9, 2014 at 21:55

1 Answer 1

2

Here is your sample parser:

from pyparsing import *

stringWord = Word(alphas, alphanums)

# only want words not at the end of the string for the leading part
leadingWord = stringWord + ~LineEnd()

leadingPart = originalTextFor(stringWord + ZeroOrMore(leadingWord))

# define parser, with named results, similar to named groups in a regex
parser = leadingPart("first") + Optional(stringWord, default='')("second")

Here's how it works in practice:

tests = ["aa234", 
         "aa234 bbb2b ccc ddd eee ",]
for test in tests:
    results = parser.parseString(test)
    print results.dump()
    print results.first
    print results.second

Prints:

['aa234', '']
- first: aa234
- second: 
aa234

['aa234 bbb2b ccc ddd', 'eee']
- first: aa234 bbb2b ccc ddd
- second: eee
aa234 bbb2b ccc ddd
eee
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.