0

So im just starting to pick up c my end goal is to write a function that searches a string with a regular expression and returns an array of matches.

The biggest problem I'm having is saving the strings into memory that can be return or referenced to in a pointer passed in as a parameter.

I'd really like a way to tell how many matches there are so i could do something like the c# equivalent; if(matches.Count() > 0) { /* we have a match! */ } Then get the resulting string of each match group depending on the pattern that I'll eventually pass in.

I know this isn't correct and probably has some other errors in practice but here is the code I walked away from trying to figure it out reading up on pointers, structs, char arrays..etc

typedef struct
{
    char *match;
} Matches;

int main()
{
    regex_t regex;
    int reti;
    char msgbuf[100];
    int max_matches = 10;
    regmatch_t m[max_matches];

    char str[] = "hello world";

    reti = regcomp(&regex, "(hello) (world)", REG_EXTENDED);
    if( reti )
    {
        fprintf(stderr, "Could not compile regex\n");
        exit(1);
    }

    reti = regexec(&regex, str, (size_t) max_matches, m, 0);
    if( !reti )
    {
        puts("Match");
    }
    else if( reti == REG_NOMATCH )
    {
        puts("No match");
    }
    else
    {
        regerror(reti, &regex, msgbuf, sizeof(msgbuf));
        fprintf(stderr, "Regex match failed: %s\n", msgbuf);
        exit(1);
    }

    char *p = str;
    int num_of_matches = 0;

    Matches *matches;

    int i = 0;
    for(i = 0; i < max_matches; i++)
    {
        if (m[i].rm_so == -1) break;

        int start = m[i].rm_so + (p - str);
        int finish = m[i].rm_eo + (p - str);

        if (i == 0)
            printf ("$& is ");
        else
            printf ("$%d is ", i);

        char match[finish - start + 1];
        memcpy(match, str + start, finish - start);
        match[sizeof(match)] = 0;

        matches[i].match = match; //Need to get access to this string in an array outside of the loop

        printf ("'%.*s' (bytes %d:%d)\n", (finish - start), str + start, start, finish);

        num_of_matches++;
    }
    p += m[0].rm_eo;

    for(i = 0; i < num_of_matches; i++)
    {
        printf("'%s'\n", matches[i].match);
    }

    /* Free compiled regular expression if you want to use the regex_t again */
    regfree(&regex);

    return 0;
}

just when i thought i got it when only matching "world" i noticed when i commented out the printf statements the last printf statement was returning empty chars or random chars.

1 Answer 1

1

Your problems are mainly memory issues to do with C strings.

First, you define an array for your matches:

Matches *matches;

This defines a pointer to your match structure, but this pointer is uninitialised and doesn't point anywhere sensible. Instead, you should define an array of matches:

Matches matches[max_matches];

This will give you 10 (local) matches that you can access.

Next, you define a local string to hold a match as a variable-length array (VLA):

char match[finish - start + 1];

This time, you have allocated enough space to hold the substring. But this char buffer is local and will be gone when you reach the closing brace of the for loop body. The next pass through the loop might use the same memory. It is illegal to access this memory after the loop.

One solution is to allocate the memory on the heap with malloc:

char *match = malloc(finish - start + 1);

Note that you have to release the resources again later explicitly with free.

You copy the substring and end it with a null character. However, when you do so, you don't get the location of the null character right:

match[sizeof(match)] = 0;

sizeof is a compile-time operand that tells you how many bytes the type of the given expression occupies in memory. When you used a VLA, sizeof(match) is one after the end of thatb buffer. Now we use a pointer to allocated memory, where sizeof is the size of the pointer.

Often sizeof is confused with strlen, but here you can't use strlen, because match is not yet null-terminated as strlen requires. But you know the size of the string, of yourse:

match[finish - start] = 0;

You also don't need the pointer p, just define:

int start = m[i].rm_so;
int finish = m[i].rm_eo;

So:

  • Make sure that you actually allocate memory when you want to store things.
  • Take care that local memory isn't invalidated before you access it. (The most egregious example of this is to return the address of a local array from a function. Your case is less offensive, but also less visible.)
  • Long-lived memory can be allocated with malloc. Such memory isn't garbage collected, it must explicitly be freed with free.
  • sizeof is a compile-time operand. It is a crutch needed by raw memory functions like malloc. (I've omitted sizeof here, because sizeof(char) is guaranteed to be 1.)

Isn't working with strings in C fun?

Sign up to request clarification or add additional context in comments.

4 Comments

you deserve more than one point for that answer. thanks so much for explaining it to me. so if i freed matches would that release all the memory i allocated for each match in the loop? or do i have to free those explicitly?
looks like ill have to loop through and free each one explicitly. lol working with strings in c is "Fun" for sure.
Yes, you have to free(matches[i]) the malloced data in a loop. You can't even free(matches), because it isn't something you malloced. The rule is: One free for each pointer you have received from malloc.
Strings are admittedly a hard part of C, especially if you already know other languages that have better string support.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.