2

I am currently writing a Python script that will search an entire .c or .h file and find all the function definitions so I can then make some slight edits. To find all of the function definitions I am trying to use Regular Expressions.

What I currently have is:

"r'\w+?\s+?\w+?\s*?\(.*?\)[\n\s]*?{"

The problem with this logic is that it currently will accept certain if statements in some cases. For example:

else
   if(//herpderp){}

It does this because that \s includes \n. I feel that I wouldn't have this issue if I had my expression only look for spaces instead of any whitespace, but I can't test my theory out as it seems there is no \(insert letter here) for just a simple space.

So there is the problem. If you have any advice on how to fix my regular expression, or if there is a better way of writing the script in general please let me know.

2 Answers 2

2

A single space can be matched by using a single space, the same way you'd match any other character that isn't a metacharacter:

"r'\w+? +?\w+? *?\(.*?\)\s*?{"

The ' +?' sequence matches only one or more spaces, non-greedily. I replaced [\n\s] with \s as the \n is included already.

You can expand to a character class with more types of whitespace:

[ \t]

which would match a space or a tab.

Sign up to request clarification or add additional context in comments.

Comments

1

It does this because that \s includes \n

I'm not sure that this is a good theory since writing something like this in C is allowed:

int


    main()

A possible way can be to use a black or whitelist to ensure that what you obtain is a function. Example:

r'\b(int|float|double|char)\s+(\w+)\s*\([^)]*\)\s*{'   // whitelist

or

r'(?!(?:else)\b)\b(\w+)\s+(\w+)\s*\([^)]*\)\s*{'    // blacklist

Note: no need to use lazy quantifiers.

4 Comments

To be fair, I don't think C is a regular language so regex is poorly suited to parse it. It would probably be easier to refactor it in his editor than to write a Python program to do it :)
@AdamSmith: I'm not a C expert, but I'm not sure that nested parenthesis are allowed inside the params. In any event, the fact that a language is regular or not, is no more a problem with modern regex engines (in particular with the new regex module).
I haven't used the regex module -- is it worth picking up and learning?
@AdamSmith: yes, it's easy to install (no problem), relatively easy to learn and to use (backward compatibility with re module), and has very interesting features you can't find nowhere in other script languages (like the fuzzy search)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.