0

I finally managed to make my regex function work, but I want to know if I can possibly reduce the number of pointer declarations in the main function to one. For example, I want to convert:

int main(){
    regex* r=calloc(1,300000);
    regex* rr=r;
    if (regexmatch("T/2/b","([A-Z]+)/([0-9]+)/([a-z]+)",10,&rr)==0){
    printres(r);
    }
    free(r);
    return 0;
}

to something like:

int main(){
    regex* r=calloc(1,300000);
    if (regexmatch("T/2/b","([A-Z]+)/([0-9]+)/([a-z]+)",10,&r)==0){
    printres(r);
    }
    free(r);
    return 0;
}

but as it stands, this won't work because the regexmatch function seems to change the address of the variable which causes the program to crash at free(r);

I even tried adding reg=rp; just before the last return statement in the function in hopes that I reset the struct variable address to what it was when the function was first called, but that didn't work.

What can I do to fix this? or is my only option to use two pointers in my main function?

This is my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

typedef struct{
    char str[1000];
} regex;

long regexmatch(const char* str,const char* regexs,const size_t nummatch,regex** rp){
    regex** reg=rp;
    regex_t r;regmatch_t match[nummatch];
    if (regcomp(&r,regexs,REG_EXTENDED) != 0){return -1;}
    if (regexec(&r,str,nummatch,match,0)!=0){regfree(&r);return -1;}
    regfree(&r);size_t i=0;
    for (i=0;i<nummatch;i++){
        if (match[i].rm_so > -1){
            unsigned long sz=match[i].rm_eo-match[i].rm_so;
            if (sz > 0 && sz < 1000){
            memcpy((**reg).str,(char*)(str+match[i].rm_so),sz);
            (*reg)++;
            }
        }
    }
    (**reg).str[0]='\0';
    return 0;
}

void printres(regex* r){
    printf("Matches\n");
    while (r->str[0] != '\0'){
    printf("%s\n",r->str);
    r++;
    }
}

int main(){
    regex* r=calloc(1,300000);
    regex* rr=r;
    if (regexmatch("T/2/b","([A-Z]+)/([0-9]+)/([a-z]+)",10,&rr)==0){
    printres(r);
    }
    free(r);
    return 0;
}
3
  • try to do in regexmatch regex* rp2=*rp;regex** reg=&rp2; Commented Oct 12, 2015 at 1:49
  • that seems to work at the start of the function but I don't get why. Commented Oct 12, 2015 at 2:00
  • I see no workaround without redesigning the way your program works. Commented Oct 12, 2015 at 2:12

2 Answers 2

1

Why do you pass rp by reference? You clearly don't want the value in the calling program to change, so it would be simpler to just pass it directly.

In fact, what you really want that parameter to be is an array of regex objects ([Note 1]). So instead of using the prototype

long regexmatch(const char* str,
                const char* regexs,
                const size_t nummatch, /* This const is pointless */
                regex** rp);

it would make more sense to use:

long regexmatch(const char* str,
                const char* regexs,
                size_t nummatch,
                regex rp[nummatch]);

(Effectively, that's the same as using regex* rp as a parameter, but writing it as rp[nummatch] is more self-documenting. Because you use an empty string as a terminator (which means that you can't handle zero-length captures), you actually need nummatch to be at least one greater than the number of captures in the pattern, so it is not 100% self-documenting.

Having made that change to the prototype, you need to remove one level of indirection in the function:

long regexmatch(const char* str,
                const char* regexs,
                size_t nummatch,
                regex reg[nummatch]){
    /* Compiling the regex is the same as in your code. I removed
     * the assignment of reg from rp, since the parameter is now
     * called reg.
     */

    size_t i=0;
    for (i=0;i<nummatch;i++){
        if (match[i].rm_so > -1){
            unsigned long sz=match[i].rm_eo-match[i].rm_so;
            if (sz > 0 && sz < 1000){
                memcpy(reg->str, (char*)(str+match[i].rm_so), sz);
                /* The above memcpy doesn't nul-terminate the string,
                 * so I added an explicit nul-termination.
                 */
                reg->str[sz] = 0;
                /* I think this should be outside the if statement. Personally,
                 * I'd put it in the increment clause of the for loop.
                 * See Note 2.
                 */
                reg++;  
            }
        }
    }
    reg->str[0] = 0;
    return 0;
}

(See it live on ideone.)


Notes

  1. I found it confusing to call regex what is basically a fixed-length string representing a regex capture. Also, I don't see the point of wrapping the fixed-length character allocation in a struct, unless you plan to assign to values of type regex. But all of that is irrelevant to your basic question.

  2. When you're copying captures into your regex array, you simply ignore captures which are empty, unset, or too long. This means that you cannot access capture i by looking at the ith element in the array. If a previous capture happened to be empty, then the capture will be at position i - 1 (or earlier, if more than one capture were empty), and the empty capture won't be accessible at all. I'm not sure exactly what your goal here is, but that seems hard to use. However, since you use empty strings to indicate the end of the capture list, you cannot insert an empty capture into the list. So you really might want to rethink the API.

    A much cleaner approach would be to use an array of pointers to strings (like argv.)

Sign up to request clarification or add additional context in comments.

2 Comments

ok for this question, I'll take on this answer since I'm only dealing with strings as output and structs are overkill at the moment.
@Mike: I left the struct in place. It's not relevant.
1

In regexmatch add : regex* rp2=*rp;regex** reg=&rp2;

In your code (*reg)++; is modifying the value of rp. Its equivalent of rp++ in your code because regex** reg=rp; rp is the pointer adress to your calloc set by &r in regexmatch call. You dont want to change this pointer. So we use another pointer called rp2.

enter image description here

1 Comment

ok, I'm trying to figure out how to save the original pointer and restore it after the function finishes. I tried regex** oreg=rp;regex** reg=rp; and then restoring via rp=reg before the return statement but I still can't figure out why the declaration needs to be like the way you put it. I mean I understand simple pointers but I'm boggled. I think I need a visual explanation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.