46

As far as I know, the only way to parse Java source-code into an AST (Abstract Syntax Tree) is to use the Java Compiler Tree API: com.sun.source.tree

I have two questions:

  1. What JDKs support com.sun.source.tree?
  2. Is there a portable replacement that works for all JDKs?
4
  • If I'm not mistaken, Eclipse uses a different version of the Java model with their own parser, and there might be a way to reuse that for general parsing. Commented Dec 28, 2009 at 4:41
  • 1
    What do you mean by "support" in your first question? Are you asking which versions of Java from which vendors contain the com.sun.source.tree package? I would imagine only Sun's does. If you want to parse source code with another JDK (say, IBM's), then a standalone parser library is probably necessary. Commented Dec 28, 2009 at 4:49
  • @Brett, I know that com.sun.source.tree was only introduced in JDK6. I'm wondering if all non-Sun JDKs support this API. Commented Dec 28, 2009 at 4:51
  • com.sun is not portable. It may exist in other JDKs but do not count on it. Commented Dec 28, 2009 at 4:56

6 Answers 6

29

Regarding your second question, there are dozens of Java parsers available in addition to Sun's. Here is a small sample:

  • Eclipse's org.eclipse.jdt.core.dom package.
  • Spoon outputs a very nice annotated parse tree with type information and variable binding (and uses Eclipse's parser internally)
  • ANTLR is a parser-generator, but there are grammars for Java available
  • javaparser (which I have not used)

My best advice is to try each of them to see which works best for your needs.

Sign up to request clarification or add additional context in comments.

2 Comments

What is the difference between Eclipse's jdt DOM and Spoon?
The actual AST classes are roughly analogous, but Spoon's parse tree includes semantic information like variable binding without requiring a massive IDE infrastructure to be running. One can parse and analyze Java files by simply adding one jar file to the classpath.
9

You can possibly take the tools.jar and use it. javac is open source so you can just grab that code (assuming you can deal with the license). Antlr has grammars for Java as well.

3 Comments

Redistributing tools.jar: good point! OpenJDK's classpath exception makes for a great license.
google-java-format uses com.google.errorprone:javac-shaded to get the AST. javac-shaded embeds OpenJDK parser in itself. Example can be found at JavaInputAstVisitor.java in google-java-format.
Entry point is here
7

I've used Eclipse's AST parser. I found it to be pretty good (well it was part of an Eclipse plug-in so it did make sense to use it). See Exploring Eclipse's ASTParser.

1 Comment

Here is last WebArchive for the link above. web.archive.org/web/20090801122725/http://www.ibm.com/…
3

A working, simple to use Java Parser is... JavaParser. The project has been active for some years already. While it was initially hosted on Google code it is now available on GitHub: https://github.com/javaparser/javaparser

It is quite simple to use and the number of dependencies is small. It is also available on Maven.

It has been used for a few years, so it works quite well and permits to parse also comments, to change the AST and regenerate the code.

1 Comment

Thanks for updating the link to the project (it has been moved a few months ago and in the meantime I became a contributor to this project :D)
1

I've just come across Jexast, an extraction of the JDT's ASTParser to work independent of Eclipse (it depends on org.eclipse.jdt.internal.compiler.**).

I haven't tried it yet, but it does seem interesting.

Comments

1

It is not the only way.

See our Java Front End, which is a full featured Java parser built on top of the DMS Software Reengineering Toolkit. It parses Java, and builds ASTs as internal data structures.

The point of DMS is that it provides a huge variety of additional useful machinery (attribute grammars, symbol tables, flow analysis, AST manipulation including access and update, as well as source-to-source transformations) to analyze and transform that AST into results and/or modified source code. If you get "just" a Java parser (e.g., JavaCC + Java grammar) you will, IMHO, not be able to do a lot with it. DMS makes it possible to do a lot, without having to invent all that extra machinery yourself.

If you really don't want to use the extra machinery DMS provides, it will dump the tree as XML.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.