2

I have a C header file with a lot of enums, typedefs and function prototypes. I want to extract this data using Python regex (re). I really need help with the syntax, because I constantly seem to forget it every time I learn.

ENUMS
-----
enum
{
(tab character)(stuff to be extracted - multiple lines)
};

TYPES
-----
typedef struct (extract1) (extract2)


FUNCTIONS
---------
(return type)
(name)
(
(tab character)(arguments - multiple lines)
);

If anyone could point me in the right direction, I would be grateful.

3
  • what do you have so far in terms of your re? Commented Jun 4, 2012 at 3:23
  • regex = re.compile("enum\n{(.*)}", re.DOTALL). I thought I would get all the characters within the enums, in an array, but I get everything. Also, this is for Cython. Commented Jun 4, 2012 at 3:37
  • For enums check out stackoverflow.com/a/66037988/208880 -- some adjustments should also catch the types and functions. Commented Sep 24, 2021 at 5:29

1 Answer 1

4

I imagine something like this is what you're after?

>>> re.findall('enum\s*{\s*([^}]*)};', 'enum {A,B,C};')
['A,B,C']
>>> re.findall("typedef\s+struct\s+(\w+)\s+(\w+);", "typedef struct blah blah;")
[('blah', 'blah')]

There are of course numerous variations on the syntax, and functions are much more complicated, so I'll leave those for you, as frankly these regexps are already fragile and inelegant enough. I would urge you to use an actual parser unless this is just a one-off project where robustness is totally unimportant and you can be sure of the format of your inputs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.