0

I need to parse python code line by line. Code:

ast.parse("""if True:
print 'Yes' """)

returns AST object, but this line not:

ast.parse("if True:")

Is there way to parse that somehow? (except text parsing with RE)

I need this to modify python code, line by line after user interactive input of python code.

2

2 Answers 2

3

You can only parse a complete valid python statement or expression. if True: is incomplete: if you were to attempt to parse it you would get a syntax error.

The solution is to first determine if you have a complete statement or expression; if you do not, buffer the line and keep reading new lines until you encounter a syntax error or a complete expression. Then use ast on your buffered input.

The compile_command function can distinguish between string of code which could be incomplete rather than incorrect. If the code appears incomplete, it returns None; otherwise it returns a code object (if valid) or raises a SyntaxError.

We can use this function to determine whether to buffer or parse a line. Untested code below:

linebuffer = []
while True:
    line = raw_input()
    linebuffer.append(line)
    try:
        compiled = code.compile_command(''.join(linebuffer))
    except SyntaxError:
        linebuffer = []
    else:
        if compiled is not None:
            tree = ast.parse(''.join(linebuffer))
            linebuffer = []
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Francis. Your solution is good, just not suitable for my software. I need to parse line by line in real time. In term of programming languages "if True:" is not correct, but in term of people understanding it is.
What you ask for is impossible. You can't get an abstract syntax tree of a syntax error, and if True: by itself is a syntax error. Perhaps you should step back and restate your broader problem, because you can't get an ast line-by-line, only statement-by-statement or expression-by-expression, and it's hard to imagine what possible AST manipulations you would want to do line-by-line!
Well, what he could get is a parser that would like to read more (as evidenced by ReadLine call from its lexer), but hasn't yet issued a complaint. That would in effect be "parsing line by line". It would even read a first line, not complain, read second line, and check that. What he can't get easily this way is additional semantic checking; there may appear to be a function call in a line to a function with 5 parameters, but who checks the named function exists, let alone accepts the 5 parameters, let alone will accept the types of the arguments?
0

I think the only way here is regexp.

after user interactive input of python code.

It seems like your python code is in fact only text object, but not python code...

why is regexp unsuitable to you here?

1 Comment

Sorry for responding so late, I had no email notifications from SO (BTW, it should be default). Regex are OK, just it is reinventing the wheel. All python expressions I has to reparse through regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.