Python Parsley:parsing simple source code

Question

I have legacy code that we still use. Looking through Python's parsley and the provided tutorials, I haven't found good examples on parsing source code. As an example of my code:

{ comment

still part of the comment above

oh ya more commenting

}

ACTION_1 [file_name.ex1]
ACTION_2 [file_name.ex2] ; some comment
{ some comment} ACTION_3 [file_name.ex3]
ACTION_4 {wow another comment } [file_name.ex4]
;ACTION_5 [file_name.ex5] <-- commented out line
ACTION_6 [file_name.ex6]

So I began creating the grammar,

x = parsley.grammar = r"""
text = (anything:x ?(x not in '{}') -> x)+:d -> tex.text("".join(d))
comment = ';' (anything:x ?(x not in '\n'))+ '\n' -> ''
file_name = (anything:x ?(x in '{}') -> x)+:d -> text.text("".join(d))
"""

I am trying to parse this with parsley as a dict ={'ACTION_x': 'file_name.exx', ... }. How can I create the proper grammar to parse this file?

Corbin · Accepted Answer · 2014-04-26 15:39:37Z

There are three steps that I go through when creating grammars when I want the result to be some sort of AST. The first step is identifying the main nonterminal productions, and building them. Don't worry about reductions at first, just get the basic productions out and make sure that they can match your source file. If your language already has an existing grammar specification, use it; it is almost certainly more accurate than your opinions of how the language is structured.

That bears repeating, actually: If your language already has an existing grammar specification, use it. I've successfully adapted both PEG and CFG (BNF) descriptions of languages into Parsley grammars before.

multilineComment = '{' (~'}' anything)* '}'

This should match your multiline comment syntax. Note how I've used PEG-style (negative) lookahead assertions instead of semantic assertions; this is generally going to be more compact and more usefully express how you want things to parse. Read it out loud: "A multiline comment is an opening brace, followed by zero or more anythings that are not closing braces, followed by a closing brace."

The single-line comments are trickier, because your language appears to be whitespace-sensitive. This means that each rule that consumes newlines has to agree when and where the consumption of newlines will happen.

lineComment = ';' (~'\n' anything)* '\n'

Fun story: I actually wrote an until rule to help with these "do this until that" sorts of rules, but it turns out that it makes things messier! Live and learn.

The second step is to write tests. From your remarks on IRC, I'm guessing that you're aware of how to write tests for Parsley code, so I won't cover it here; in short, write a bunch of small segments of code, and run them through Parsley, asserting that they succeed or fail. You'll come back and change the succeeding tests to assert that the snippet was parsed into a valid tree in the third step.

The third step is to add your reduction annotations (bindings, reductions) to your rules. This will turn your grammar from a mere recognizer of your target language into a parser.

Remember your filename rule from before?

fileName = '[' (~']' anything)+ ']'

Let's go ahead and add a binding and reduction to it; we want to capture the contents of the brackets and return it as a string.

fileName = '[' (~']' anything)+::cs ']' -> "".join(cs)

Parsley also has a way to slice from the input iterable, which is pretty nifty if you're parsing strings and want to capture strings.

fileName = '[' <(~']' anything)+>:s ']' -> s

I've used single-letter variables here because I'm stuck in Haskell Land, but you're free to use any names you like for bindings.

Hope this helps! ~ C.

the language I'm "reverse engineering" doesn't have an available PEG that I can use. I can reverse engineer it though, I think. For example , most of its code is in the form of <pre> "Command" "data record" "location where applied" "length of application"</code>

Collectives™ on Stack Overflow

Python Parsley:parsing simple source code

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related