0

Ok so i have a bunch of C and C++ code which i need to filter through and find function defenitions. I don't know the function type/return value and i don't know the number of parameters etc in the function defenition or function calls.

So far i have:

import re, sys
from os.path import abspath
from os import walk

function = 'msg'
regexp = r"(" + function + ".*[^;]){"

found = False
for root, folders, files in walk('C:\\codepath\\'):
    for filename in files:
        with open(abspath(root + '/' + filename)) as fh:
            data = fh.read()
            result = re.findall(regexp, data)
            if len(result) > 0:
                sys.stdout.write('\n Found function "' + config.function + '" in ' + filename + ':\n\t' + str(result))
                sys.stdout.flush()
    break

This however, produces some unwanted results. The regexp must be fault taulrant for example these combinations:

Finding "msg" defenition but not "msg()" calls in all mutations of say:

void
shapex_msg (struct shaper *s)
{
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

or

void shapex_msg (struct shaper *s)
{
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

or

void shapex_msg (struct shaper *s) {
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

1 Answer 1

1

Maybe something like the following regex:

def make_regex(name):
    return re.compile(r'\s*%s\s*\([^;)]*\)\s*\{' % re.escape(name))

Testing your examples:

>>> text = '''
... void
... shapex_msg (struct shaper *s)
... {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }
... 
... void shapex_msg (struct shaper *s)
... {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }
... 
... void shapex_msg (struct shaper *s) {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }'''
>>> shapex_msg = make_regex_for_function('shapex_msg')
>>> shapex_msg.findall(text)
['\nshapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s) {']

It works only with multiline definitions:

>>> shapex_msg.findall('''int
        shapex_msg      (
int a,
int b
)  

        {'''
['\n   \tshapex_msg   \t(\nint a,\nint b\n)  \n\n\t{']

While, with function calls:

>>> shapex_msg.findall('shapex_msg(1,2,3);')
[]

Just as a note, your regex doesn't work because .* is greedy and thus it wasn't matching the right closing parentheses.

Sign up to request clarification or add additional context in comments.

2 Comments

Your last edit gave me a working copy! Thank you! Ineed the greedy parameter was messing things up.. Been trying so many combinations i couldn't wrap my head around it.. so thank you!
@Torxed Yes, sorry. When writing it down I forgot to put an * :s

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.