0

I'm currently trying to parse a C header file AND retain the name of some macro-defined values.

As Tim notes in the comments below the preprocessor does its job and ultimately derives that raw value that will be used in the code. However, I was hoping to accomplish both generating the AST of the header file AND extracting, or retaining easy access to the name of the macro-defined values.

All of that to say, and ask, is there any way of utilizing pycparser to extract raw macro defined values, or is this out of the scope or not how the tool was intended to be used?

My code is rather simple, and as mentioned earlier, is outputting most of what I would expect to be there minus macro defined values.

ast = parse_file(filename, use_cpp=True,
        cpp_path=CC, # CC = gcc
        cpp_args=[
            '-E',
            r'-I/path/to/fake_libc_include',
            r'-I/other/includes'
            ]
        )
ast.show()

Say for example I make a main_file.c, and I include the header I want to parse.

#include <target_header.h>


int main() {
  int i = foobar; // #define foobar 0x3

  return 0;
}

I then do the same process of parsing the C file rather than header file using pycparser. I will get the following:

FuncDef: 
Decl: main, [], [], [], []
  FuncDecl: 
    TypeDecl: main, [], None
      IdentifierType: ['int']
Compound: 
  Decl: i, [], [], [], []
    TypeDecl: i, [], None
      IdentifierType: ['int']
    Constant: int, 0x0003
  Return: 
    Constant: int, 0

So the information of the macro value of the defined macro and the preprocessor only cares about the value, as expected. Ultimately I was hoping for a helper function from pycparser that does "pre-preprocessor" lifting so to speak of the actual name of the macro-define values but I think I might be running into a wall of pycparser not being built for that purpose.

Perhaps just using a separate approach or tool such as the one listed here might be the best bet but let me know if anyone has done something similar with pycparsrer so as to avoid using more than one library:

Use C preprocessor macros (just the constants) as Python variables

UPDATE: this question is based on a required capability that is outside of the scope of pycparser but I am keeping this question and answering using my approach in case anyone else runs into the same need.

3
  • 1
    Right. That's how C works. The compiler never sees the word "foobar". The preprocessor runs as a first pass and filters the code to do all of the #define substitutions. By the time it gets compiled, that statement is literally int i = 3;. Perhaps you should try doing your parse without use_cpp. Commented Mar 23, 2023 at 17:30
  • I did think of that as well. I guess I was running into a chicken and egg scenario because I was wanting the resultant AST of the header file WHILE also having easy access to the raw name of the macro defined values, should've made that more clear. But I'll give that a stab thanks! Commented Mar 23, 2023 at 17:40
  • 1
    pycparser does not preserve preprocessor macros - it is designed to run after the preprocessor. As you found, there's many hacks/workarounds you can employ, but this is outside of the scope of the tool Commented Mar 24, 2023 at 14:42

1 Answer 1

0

I was rather hard-headed about only using pycparser and ended up doing a hacky approach to get what I wanted done.

I'm posting this as answer in case anyone else comes across this post with a similar need.

To summarize, I needed a way to both generate the AST that pycparser gives while still retaining the name and value of ALL macro-defined variables in the header file.

To accomplish this, I first take into consideration the header I will be analyzing, say, foobar.h.

I open the file and manually parse out all #define'd variables with obvious exceptions such as functions and the header guard. I then programatically generate a dummy_file.c and write all headers as basic ints so I have the preprocessor's evaluated value of all macro variables, as follows:

#include <foobar.h>

int main() {

   int def_foo = FOO;
   int def_bar = BAR;
   // etc.

    return 0;
}

In the end I run parse_file on the dummy_file.c and get both the original AST that was generated by running parse_file on foobar.h AND the names and values of all macro-defined variables.

For those who are curious as to why I wouldnt just re-define the #defines as ints within the header is mostly out of treating the header as a "ground truth" so that I can generate the respective headers API and values in python as well. I'm expecting this header to file to change over time as well.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.