12

I'd like to parse C header files in Javascript. Is there any such library available? Otherwise, any tips to help me get started?

Update: My ultimate goal is to automatically build interfaces for node-ffi. The parser doesn't necessarily have to be in Javascript as long as it can spit out a format understandable by Javascript. If it's very hard to develop by myself, I'll probably have to go with an off the shelf solution...?

8
  • 2
    err i really don't understand the question... parse a HEADER file? to what purpose Commented Nov 1, 2012 at 5:48
  • 1
    I hate to say it this way, but... are you sure you want to do that? Parsing C's syntax is notoriously hard, even if you didn't have to deal with C pre-processor macro expansion and includes. Commented Nov 1, 2012 at 5:49
  • @JameySharp writing a CPreProcessor that expands the macros and includes files is extremely easy compared to parsing the rest of the syntax of C. Commented Nov 1, 2012 at 5:52
  • 1
    Parsing is a huge subject. Which C standard are you aiming for? What do you want to parse it to? Why do you even want to do that? Furthermore, have you any background experience in parsing? Commented Nov 1, 2012 at 5:53
  • 1
    When it comes to pure parsing of C source or headers, like just creating an AST, I find it relatively trivial compared to most other languages. C is actually a very simple language in that way. However, if you don't know what is meant by terms like "AST" or "recursive descent" you definitely have a bit of a learning curve in front of you. If you explain the reason you want to do this we might be able to help you better. Commented Nov 1, 2012 at 6:41

3 Answers 3

8

You should check out clang.

For a simple command-line invocation, you can try this:

clang -cc1 -ast-dump-xml myfile.h

Or you can build your own tool using clang reasonably-well-documented parser library, which will build an AST for you, and let you walk it as you see fit (perhaps for output in JSON).

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. There are actually Javascript bindings for Node.js github.com/tjfontaine/node-libclang
4

You might start by looking at peg.js which generates javascript code to parse a grammar given as input. Details avalable here https://pegjs.org/

Then yo would need to write or find a grammar for the header files you want to parse.

1 Comment

Great, looks like this could really help.
2

Well I'll answer my own question since I found something interesting:

http://www.swig.org/Doc2.0/SWIGDocumentation.html#SWIG_nn2

Swig can output an XML representation of C header files that I could then load from Javascript.

For example:

swig -module yaml -xmlout yaml.xml yaml.h

Generates the following file (snippet below for the yaml_token_delete function):

...

<cdecl id="16015" addr="0x10835d500" >
    <attributelist id="16016" addr="0x10835d500" >
        <attribute name="name" value="yaml_token_delete" id="16017" addr="0x1082b2d00" />
        <attribute name="sym_symtab" value="0x1081007e0" id="16018" addr="0x1081007e0" />
        <attribute name="view" value="globalfunctionHandler" id="16019" addr="0x1082b2d00" />
        <attribute name="kind" value="function" id="16020" addr="0x1082b2d00" />
        <attribute name="sym_name" value="yaml_token_delete" id="16021" addr="0x1082b2d00" />
        <attribute name="wrap_parms" value="0x10835d460" id="16022" addr="0x10835d460" />
        <attribute name="decl" value="f(p.yaml_token_t)." id="16023" addr="0x1082b2d00" />
        <attribute name="tmap_out" value="" id="16024" addr="0x1082b2d00" />
        <parmlist id="16025" addr="0x10835d460" >
            <parm id="16026">
                <attributelist id="16027" addr="0x10835d460" >
                    <attribute name="tmap_typecheck" value="void *vptr = 0;&#10;  int res = SWIG_ConvertPtr($input, &amp;vptr, SWIGTYPE_p_yaml_token_s, 0);&#10;  arg1 = SWIG_CheckState(res);" id="16028" addr="0x1082b2d00" />
                    <attribute name="tmap_typecheck_match_type" value="p.SWIGTYPE" id="16029" addr="0x1082b2d00" />
                    <attribute name="tmap_in_match_type" value="p.SWIGTYPE" id="16030" addr="0x1082b2d00" />
                    <attribute name="tmap_freearg_match_type" value="p.SWIGTYPE" id="16031" addr="0x1082b2d00" />
                    <attribute name="compactdefargs" value="1" id="16032" addr="0x1082b2d00" />
                    <attribute name="name" value="token" id="16033" addr="0x1082b2d00" />
                    <attribute name="emit_input" value="objv[1]" id="16034" addr="0x1082b2d00" />
                    <attribute name="tmap_typecheck_precedence" value="0" id="16035" addr="0x1082b2d00" />
                    <attribute name="tmap_in_numinputs" value="1" id="16036" addr="0x1082b2d00" />
                    <attribute name="tmap_in" value="res1 = SWIG_ConvertPtr(objv[1], &amp;argp1,SWIGTYPE_p_yaml_token_s, 0 |  0 );&#10;  if (!SWIG_IsOK(res1)) { &#10;    SWIG_exception_fail(SWIG_ArgError(res1), &quot;in method '&quot; &quot;$symname&quot; &quot;', argument &quot; &quot;1&quot;&quot; of type '&quot; &quot;yaml_token_t *&quot;&quot;'&quot;); &#10;  }&#10;  arg1 = (yaml_token_t *)(argp1);" id="16037" addr="0x1082b2d00" />

...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.