14

I'm looking to write a Python import filter or preprocessor for source files that are essentially Python with extra language elements. The goal is to read the source file, parse it to an abstract syntax tree, apply some transforms in order to implement the new parts of the language, and write valid Python source which can then be consumed by CPython. I want to write this thing in Python and am looking for the best parser for the task.

The parser built in to Python is not appropriate because it requires the source files be actual Python, which these will not be. There are tons of parsers (or parser generators) that will work with Python, but it's hard to tell which is the best for my needs without a whole bunch of research.

In summary, my requirements are:

  1. Parser is written in Python or has Python bindings.
  2. Comes with a Python grammar that I can tweak, or can easily consume a tweakable Python grammar available elsewhere (such as http://docs.python.org/reference/grammar.html).
  3. Can re-serialize the AST after transforming it.
  4. Should not be too horrific to work with API-wise.

Any suggestions?

7
  • Just to be clear: The language you want to parse does not even parse as pure Python. Correct? Commented Feb 23, 2012 at 20:09
  • Have you considered looking into PyYAML? Commented Feb 23, 2012 at 20:17
  • @SvenMarnach: That is correct. Commented Feb 23, 2012 at 20:17
  • Ned Batchelder has a nice overview of Python parsing tools on his blog. Commented Feb 23, 2012 at 21:09
  • 1
    Another thing I just found is rope. Not a parser itself, but does some of the things I was thinking about doing, and I should probably look at it to see how it does what it does. Commented Feb 24, 2012 at 5:53

3 Answers 3

9

The first thing that comes to mind is lib2to3. It is a complete pure-Python implementation of a Python parser. It reads a Python grammar file and parses Python source files according to this grammar. It offers a great infrastructure for performing AST manipulations and writing back nicely formatted Python code -- after all it's purpose is to transform between two Python-like languages with slightly different grammars.

Unfortunately it's lacking documentation and doesn't guarantee a stable interface. There are projects that build on top of lib2to3 nevertheless, and the source code is quite readable. If API stability is an issue, you can just fork it.

Sign up to request clarification or add additional context in comments.

2 Comments

Good point! "After all it's purpose is to transform between two Python-like languages with slightly different grammars"
Got two great answers here, but this is clearly the approach I should try first.
7

I would recommend that you check out my library: https://github.com/erezsh/lark

It can parse ALL context-free grammars, automatically builds an AST (with line & column numbers), and accepts the grammar in EBNF format, which is considered the standard.

It can easily parse a language like Python, and it can do so faster than any other parsing library written in Python.

In fact, there's already an example python grammar and parser

3 Comments

The example python grammar link is broken as well as parser link. The library is great tho!
That looks really great, thanks for sharing this. Just a noob question (just starting to study this), can LLVMLite read the resulting AST directly?
2

I like SimpleParse a lot, but I never tried to feed it the Python grammar (BTW, is it a deterministic grammar?). If it chokes, PLY will do the job.

See this compilation about Python parsing tools.

1 Comment

BTW, is it a deterministic grammar Yes (and a remarkably simple one).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.