25

I managed to compile successfully clang for windows with cmake and visual studio 10. I would like to get an XML file as AST representation of the source code. There is one option that provides the result with clang with gcc under linux (ubuntu) but doesn't work on the windows box:

clang -cc1 -ast-print-xml source.c

However, this is invoking the compilation stage (which I would like to avoid). Digging in the source code didn't help me so far as I am quite new to clang. I could manage to generate the binary version of the AST by using:

clang -emit-ast source.c

Unfortunately, this format is unusable directly for parsing. Is there some existing method to directly generate the XML tree instead of a binary one in clang?

The goal is to use the XML representation in other tools in the .NET environment so I would need to make some wrapping around the native clang lib to access the binary AST. Maybe there is a third option if someone already wrote some binary clang AST parser for .NET?

Is it possible that I am missing something like if the AST generated by the clang front end is not equivalent to the one generated in the compilation stage.

4
  • 2
    My company builds C++ front ends, and we can emit complete XML dumps of the ASTs. We have this as a check-box item, because people ask for it. Nobody really uses it, because the amount of output for a real C++ program (which includes all the header files) is simply enormous, which makes it slow and clumsy to deal with. The real question is, why do you want to do this? Clang likely already offers a vast amount of machinery to process the C++ AST directly (as does our corresponding tool); why would you want to try to replicate all of that work? Why not just use Clang for your purpose? Commented Mar 19, 2011 at 17:42
  • ... see a C++ tree dump at stackoverflow.com/a/17393852/120163 This isn't XML, but the tool can produce XML also with the exact same content. Commented Apr 12, 2016 at 10:02
  • @IraBaxter - I can answer this with my use case: Because libclang/C++ is so convoluted and slow, it's actually faster for me to process an entire AST text dump with sed/awk than to use the library. Commented Mar 10 at 22:52
  • Wow. Well, it doesn't have to be that way. We parse the code directly to trees in memory, and then the walk the trees in memory. This doesn't seem very hard. I don't understand a tool that makes that slower or harder to use then a gigabyte text string dump. Commented Mar 11 at 23:26

3 Answers 3

22

For your information, the XML printer has been removed from the 2.9 version by Douglas Gregor (responsible of CLang FrontEnd).

The issue was that the XML printer was lacking. A number of the AST nodes had never been implemented in the printer, as well as a number of the properties of some nodes, which led to an inaccurate representation of the source code.

Another point raised by Douglas was that the output should be suitable not for debugging CLang itself (which is what the -emit-ast is about) but for consumption by external tools. This requires the output to be stable from one version to another. Notably it should not be a 1-on-1 mapping of CLang internal, but rather translate the source code into standarized language.

Unless there is significant work on the printer (which requires volunteers) it will not be integrated back...

Sign up to request clarification or add additional context in comments.

5 Comments

The funny part is that -emit-ast pretty-prints types instead of representing their structure, and for this reason is absolutely useless. It was only possible with an xml printer to debug and automatically verify the types in declarations.
@SK-logic: since xml is no longer an option, we might see an improvement of the -emit-ast behavior.
Thanks for all this interesting information. I will have a look at the old xml printer and try to see if I can make something useful with it for my own usage. Having some universal/standardized way of representing source code would be really a good thing, but a common denominator implies throwing away features and keeping specific things for all kinds of languages makes it too complex... Some extensible approach would be nice... For now thanks a lot for this answer.
It seems the current version (3.2) has it available on debug mode, I was able to extract xml from it. The 2.9 however does seems unable to do so for me thou.
@OeufcoquePenteano: How? Link?
3

I've been working on my own version of extracting XML from Clang's AST. My code uses the Python bindings of libclang in order to traverse the AST.

My code is found at https://github.com/BentleyJOakes/PCX

Edit: I should add that it is quite incomplete in terms of producing the right source code tokens for each AST node. This unfortunately needs to be coded for each AST node type. However, the code should give a basis for anyone who wants to pursue this further.

Comments

1

Using a custom ASTDumper would do the job, without ofc compiling any source file. (stop clang in the frontend part). but you have to deal with all C and C++ code sources of llvm to accomplish that .

1 Comment

Internally clang implemented a JSONNodeDumper and a TextNodeDumper. I think it would be more convenient to get xml-format based on JSONNodeDumper by calling a converter library.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.