3

I am trying to get a Python regex to search through a .c file and get the function(s) inside it.

For example:

int blahblah(
  struct _reent *ptr __attribute__((unused)),
  const char    *old,
  const char    *new
)
{
...

I would want to get blahblah as the function.

This regex doesn't work for me, it keeps on giving me None: r"([a-zA-Z0-9]*)\s*\([^()]*\)\s*{"

3 Answers 3

4

(?<=(int\s)|(void\s)|(string\s)|(double\s)|(float\s)|(char\s)).*?(?=\s?\()

http://regexr.com?3332t

This should work for what you want. Just keep adding types that you need to catch.

re.findall(r'(?<=(?<=int\s)|(?<=void\s)|(?<=string\s)|(?<=double\s)|(?<=float\s‌​)|(?<=char\s)).*?(?=\s?\()', string) will work for python.

Sign up to request clarification or add additional context in comments.

2 Comments

Im running re.findall( r'(?<=(int\s)|(void\s)|(string\s)|(double\s)|(float\s)|(char\s)).*?(?=\s?\()', string) but I seem to get the error: raise error, v # invalid expression sre_constants.error: look-behind requires fixed-width pattern
@AA It seems that python regex is slightly different from the norm. Try this re.findall(r'(?<=(?<=int\s)|(?<=void\s)|(?<=string\s)|(?<=double\s)|(?<=float\s)|(?<=char\s)).*?(?=\s?\()', string)
3

The regular expression isn't catching it because of the parentheses in the arguments (specifically, the parentheses in __attribute__((unused))). You might be able to adapt the regular expression for this case, but in general, regular expressions cannot parse languages like C. You may want to use a full-fledged parser like pycparser.

4 Comments

I would like to use the built-in python library. Thanks for the response.
@AA: If you have to find functions in any valid C code, then I'm not sure you have an option. A regular expression cannot parse a language like C — C is a context-free language, whereas regular expressions can only parse regular languages. (see Chomsky hierarchy on Wikipedia)
Cant we just match against.. [int, void, ect.][any amount of space][function name][bracket][anything in here][unbracket][curly bracket][uncurly bracket]
@AA: True regular expressions cannot match the "anything in here" part of that. Without extensions that make it not actually regular expressions, you cannot match nested parentheses. It will stop on the first closing parenthesis.
1

Regexps are not a proper tool for extracting some semantic information from source code files (though they're good for syntax highlighting - because syntax is often expressed through regular expressions). Regexps can't handle nested constructions, track what is going on, distingiush types and symbols.

I'd recommend some specialized tool that is really aware of the language structure, like ctags or python-pygccxml.

ctags is a program that generates a list of entities in a C source with with their places (used to assist navigation through C code bases in text editors like vi and emacs). python-pygccxml is a Python binding to C library libgccxml that uses gcc internals to analyze the code and produces rich and structured output about program semantics.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.