3

Minimal code example is as follows:

#include <cstdlib>
#include <iostream>
#include <vector>
#include <regex.h>

using namespace std;

class regex_result {
public:
    /** Contains indices of starting positions of matches.*/
    std::vector<int> positions;
    /** Contains lengths of matches.*/
    std::vector<int> lengths;
};

regex_result match_regex(string regex_string, const char* string) {
    regex_result result;
    regex_t* regex = new regex_t;
    regcomp(regex, regex_string.c_str(), REG_EXTENDED);
    /* "P" is a pointer into the string which points to the end of the
       previous match. */
    const char* pointer = string;
    /* "n_matches" is the maximum number of matches allowed. */
    const int n_matches = 10;
    regmatch_t matches[n_matches];
    int nomatch = 0;
    while (!nomatch) {
        nomatch = regexec(regex, pointer, n_matches, matches, 0);
        if (nomatch)
            break;
        for (int i = 0; i < n_matches; i++) {
            int start,
                finish;
            if (matches[i].rm_so == -1) {
                break;
            }
            start = matches[i].rm_so + (pointer - string);
            finish = matches[i].rm_eo + (pointer - string);
            result.positions.push_back(start);
            result.lengths.push_back(finish - start);
        }
        pointer += matches[0].rm_eo;
    }
    delete regex;
    return result;
}

int main(int argc, char** argv) {
    string str = "this is a test";
    string pat = "this";
    regex_result res = match_regex(pat, str.c_str());
    cout << res.positions.size() << endl;
    return 0;
}

So I have written a function that parses a given string for regular expression matches. The result is held in a class that is essentially two vectors, one for the positions of the matches and one for the corresponding match lengths.

This works fine, but when I ran valgrind over it, it shows some substantial memory leaks.

When using valgrind --leak-check=full on the code above I get:

==24843== Memcheck, a memory error detector
==24843== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==24843== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==24843== Command: ./test
==24843== 
1
==24843== 
==24843== HEAP SUMMARY:
==24843==     in use at exit: 11,688 bytes in 37 blocks
==24843==   total heap usage: 54 allocs, 17 frees, 12,868 bytes allocated
==24843== 
==24843== 256 bytes in 1 blocks are definitely lost in loss record 14 of 18
==24843==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24843==    by 0x543549A: regcomp (regcomp.c:487)
==24843==    by 0x400ED0: match_regex(std::string, char const*) (in <path>)
==24843==    by 0x4010CA: main (in <path>)
==24843== 
==24843== 11,432 (224 direct, 11,208 indirect) bytes in 1 blocks are definitely lost in     loss record 18 of 18
==24843==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24843==    by 0x4C2CF1F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24843==    by 0x5434BAF: re_compile_internal (regcomp.c:760)
==24843==    by 0x54354FF: regcomp (regcomp.c:506)
==24843==    by 0x400ED0: match_regex(std::string, char const*) (in <path>)
==24843==    by 0x4010CA: main (in <path>)
==24843== 
==24843== LEAK SUMMARY:
==24843==    definitely lost: 480 bytes in 2 blocks
==24843==    indirectly lost: 11,208 bytes in 35 blocks
==24843==      possibly lost: 0 bytes in 0 blocks
==24843==    still reachable: 0 bytes in 0 blocks
==24843==         suppressed: 0 bytes in 0 blocks
==24843== 
==24843== For counts of detected and suppressed errors, rerun with: -v
==24843== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Is my code wrong or is there really a bug in those files?

2 Answers 2

7

Your regex_t management is not required to be dynamic, and though that isn't directly related to you problem, it is a little odd. The real problem is you never regfree() your resulting expression if compiled successfully (which you should verify). You should setup your regular expression like this:

regex_t regex;
int res = regcomp(&regex, regex_string.c_str(), REG_EXTENDED);
if (res == 0)
{
    // use your expression via &regex
    ....

    // and eventually free it when done.
    regfree(&regex);
}

If your implementation supports them, I strongly advise using the C++11 provided <regex> library, as it has nice RAII solutions to much of this.

Sign up to request clarification or add additional context in comments.

2 Comments

Ahh, thank you. I'm choosing your answer even though you were a little later because of the additional info you gave.
At the moment, C++11 is not an option, hence the way I am doing it.
3

You must call regfree() to free memory allocated by regcomp().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.