4

I am trying to parse Java source to get the method names, their invocations, variable names, etc. I was looking for a pre-built or extensible module in Python and stumbled upon plyj (https://github.com/musiKk/plyj). I want to find out a method, then get the method's code and do some string processing on it based on some conditions. But I am not able to figure out its usage, the example is too vague. Can anyone point me to a good usage example?

Also, if you can let me know if antlr3 (https://github.com/antlr/antlr3) is more usable or not (with example), as I am new to these modules and do not know which one to go with. I have no performance issues, I just want to compare them based on functionalities and ease of use.

Thanks!

7
  • If you want accurate information about types, you'll need a full Java name and type resolver, which antlr3 will not give you. If plyj is really just a parser (as I suspect), it won't give you that information either. This type information is hard to derive; consider the amount of Java reference manual devoted to telling what the symbols all mean. You might be able to get unqualified class and method name from a raw parse. Is that enough? (To find a method, you may already need to do full name type resolution; otherwise what does A:B:C mean? Commented Jan 23, 2014 at 4:58
  • @Ira: I do not understand you. Please elaborate. Commented Jan 23, 2014 at 5:06
  • OK. You want to look up the method names in class A:B:C. How exactly are you going to find out where C is, without knowing where B is defined, and processing the contents of package B to find C's declaration? It gets a lot worse with generics. Commented Jan 23, 2014 at 5:43
  • No, I am not gonna be so complex! What I want is a script which will take a .java file as input, and tell me the methods in it, get me a method's code, get me the class variable names. In other words, I can look for methods using regexes, but that will be too complex ,and I want to use one of these parsers to do it. Commented Jan 23, 2014 at 6:09
  • 1
    @Krish: I added two sample programs that print some symbols of a provided source file. Commented Apr 23, 2014 at 21:46

1 Answer 1

2

If you'll settle for a hueristic solution, then get whichever one has a reliable Java parser that builds an AST (my understanding is that ANTLR is pretty good for Java), parse the source, and build custom code to crawl the tree data structure down to find the class delclaration, and crawl one layer shallower to get to the methods/members. [I don't know if PlyJ has a tested Java grammar, or builds ASTs].

For the ANTLR solution, at least, it should be pretty easy to print out the names of those. It will not be so easy to print the bodies; ANTLR has no easy way to my knowledge of printing out the subtree at a point as text. and if you could, you might find the comments have vanished, being eliminated during lexing. You might be able to extract line numbers from the tree nodes, and then go back to the original file and print out line number ranges to get method bodies. (Most parser generators even if they build ASTs do not support printing an arbitrary subtree, so I assume that pylj isn't different).

This won't handle multiple classes per file, or nested classes very well.

There are tools that can do this reliably and accurately but are more effort to put in place.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.