3

I want to build a Java parser using ANTLR in Python.

I downloaded the grammars from the ANTLR repository:

Lexer :https://github.com/antlr/grammars-v4/blob/master/java/java/JavaLexer.g4

Parser: https://github.com/antlr/grammars-v4/blob/master/java/java/JavaParser.g4

Then I used my script.bat to generate the python code I need :

java -jar antlr-4.8-complete.jar -Dlanguage=Python3 Java8Lexer.g4
java -jar antlr-4.8-complete.jar -Dlanguage=Python3 Java8Parser.g4

antlr-4.8-complete.jar downloaded here: https://www.antlr.org/download/antlr-4.8-complete.jar

This generated this list of files:

  • Java8Lexer.interp
  • Java8Lexer.py
  • Java8Lexer.tokens
  • Java8Parser.interp
  • Java8Parser.py
  • Java8Parser.tokens
  • Java8ParserListener.py

Then I wrote this code to parse a java file:

import antlr4
from antlr4 import *
from java.antlr_unit2 import Java8Parser, Java8Lexer

def main():
    code = open('test.txt', 'r').read()
    lexer = Java8Lexer.Java8Lexer(antlr4.InputStream(code))
    stream = antlr4.CommonTokenStream(lexer)
    parser = Java8Parser.Java8Parser(stream)
    tree = parser.expression()
    print (tree)

if __name__ == '__main__':
    main()

My test java code test.txt is something like this:

package org.jabref.gui.fieldeditors;
import java.util.ArrayList;
/**
 * This class contains some code
 */
public class TextInputControlBehavior {

    private static final boolean SHOW_HANDLES = Properties.IS_TOUCH_SUPPORTED && !OS.OS_X;

}

Since this is too short, here is an example of code I want to parse: https://pastebin.com/KNxfasKQ

When I run this code I get this :

line 1:0 extraneous input 'package' expecting {'boolean', 'byte', 'char', 'double', 'float', 'int', 'long', 'new', 'short', 'super', 'this', 'void', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '!', '~', '++', '--', '+', '-', Identifier, '@'}
[]

Am I doing something wrong? I didn't wrote the grammar, I just took it from ANTLR repo.

EDIT: Pavel Smirnov's answer helped me and now I don't get the warning. But now the program seems really slow and i get an empty tree as output.

SOLVED: I was printing tree but I had to print(tree.toStringTree(recog=parser))

So the final code is:

import antlr4
from antlr4 import *
from java.antlr_unit2 import Java8Parser, Java8Lexer

def main():
    code = open('test.txt', 'r').read()
    lexer = Java8Lexer.Java8Lexer(antlr4.InputStream(code))
    stream = antlr4.CommonTokenStream(lexer)
    parser = Java8Parser.Java8Parser(stream)
    tree = parser.compilationUnit()
    print(tree.toStringTree(recog=parser))

if __name__ == '__main__':
    main()

1 Answer 1

2

Your text file contains a compilationUnit, not an expression you try to parse with

tree = parser.expression()

Look carefully through the parser rules, the rule you need is

compilationUnit
    : packageDeclaration? importDeclaration* typeDeclaration* EOF
    ;

which has to be called as

tree = parser.compilationUnit()
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for your answer. I don't get that warning now! But i don't get an output, the program is looped. I'm trying to understand why. I update the question by adding a better example for test.txt
I got an empty tree output

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.