2

I wrote a program to split a char array to a 2D array as the comments below the definition of the function states. However, I am receiving segmentation fault on this piece of code. Can some one help to find why?

The my_strlen(str) functions works the same way as the original strlen(str) function, and it works perfectly. And the length of the char array is limited, so I do not really worry about the efficiency in memory allocation.

char **my_str2vect(char *str) {
    // Takes a string 
    // Allocates a new vector (array of string ended by a NULL), 
    // Splits apart the input string x at each space character 
    // Returns the newly allocated array of strings
    // Any number of ' ','\t', and '\n's can separate words.
    // I.e. "hello \t\t\n class,\nhow are you?" -> {"hello", "class,", "how", "are","you?", NULL}
    int str_len = my_strlen(str);
    char **output = malloc(sizeof(char *) * str_len); // Allocate a 2D array first.
    if (**output) {
        for (int a = 0; a < str_len; ++a) {
            output[a] = malloc(sizeof(char) * str_len);
        }
    } else {
        return NULL;
    }
    int i = 0;
    int j = 0;
    while (i < str_len) { // Put the characters into the 2D array.
        int k = 0;
        while ((str[i] == ' ' || str[i] == '\t' || str[i] == '\n') && i < str_len) {
            ++i;
        }
        while ((!(str[i] == ' ' || str[i] == '\t' || str[i] == '\n')) && i < str_len) {
            output[j][k] = str[i];
            ++i;
            ++k;
        }
        output[j][k] = '\0';
        ++j;
    }
    output[j] = NULL;
    return output;
}
11
  • What is my_strlen() ? Commented Dec 4, 2016 at 18:44
  • Did you run it in a debugger to determine where the seg fault is occurring? This is always the first step when C programs malfunction at runtime. Commented Dec 4, 2016 at 18:45
  • @Echo111 I do not see a sense in this loop for(int a = 0; a < my_strlen(str); ++a){ output[a] = malloc(sizeof(char) * my_strlen(str)); } Does the array output has the number of elements equal to the number of characters in str? Commented Dec 4, 2016 at 18:46
  • 1
    The second loop will certainly not stop at the end of the input string. But there are other problems with this code. Commented Dec 4, 2016 at 18:48
  • 1
    It's your if statement; **output is of type char. You want output instead, which points to the allocated memory (which you probably wanted to check). Commented Dec 4, 2016 at 21:27

3 Answers 3

1

I have tried to use gdb to determine the problem. enter image description here It is about **output control. You should check address of *output instead of where pointer to pointer to. You are allocating places in for loop until length of the string. It may cause defragmentation. Moreover, the 1D char array should be passed by const to be not changeable. Instead, you should use the snippet

// allocation (in the function)
// protoype: char** my_str2vect(char const* str)
int a;
char** output = malloc(str_len * sizeof(char *));
    output[0] = malloc(str_len * str_len * sizeof(char));
    for(a = 1; a < str_len; a++)
        output[a] = output[0] + a * str_len;

// freeing (in main())  
char ** x;
char const* str = "hello \t\t\n class,\nhow are you?";
x = my_str2vect(str);

free((void *)x[0]);
free((void *)x);

En passant, the source aids to get more knowledge about allocation.

Sign up to request clarification or add additional context in comments.

3 Comments

That's exactly the problem. That if is checking the contents of the first output location, which at this point is junk. You want if (output).
what about &*output? What does it mean?
The star operator means "contents of". So *output means "contents of the pointer output." In this case that itself is a pointer to a character: the first pointer in the array. That pointer is junk until the for loop sets it to the value returned by malloc. That's why **output produces a seg fault. You're taking the contents of what should be a pointer, but is junk.
1

As the debugger is telling you the if (**output) is broken. It's trying to dereference the pointer in the first output array location. That's junk at the point of the if. Hence, the seg fault. You want if (output). When I fix this and use strlen in place of your rewrite, it seems to work fine.

It's considerably simpler to make one copy of the input string and use this for all the strings in the returned vector. You can also use strtok to find the words, but that's not thread safe.

Here's a suggestion:

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

char **split(char *s_org) {
  size_t i;
  // Skip initial whitespace, then copy everything else.
  for (i = 0; s_org[i] && isspace(s_org[i]); ++i) /* skip */;
  char *s = strdup(s_org + i);
  size_t n_rtn = 0, size = 0;
  char **rtn = malloc(sizeof *rtn);
  for (i = 0;;) {
    if (!s[i]) {
      rtn[n_rtn] = NULL;
      return realloc(rtn, (n_rtn + 1) * sizeof *rtn);
    }
    if (n_rtn == size) {
      size = 2 * size + 1;
      rtn = realloc(rtn, size * sizeof *rtn);
    }
    rtn[n_rtn++] = s + i;
    while (s[i] && !isspace(s[i])) ++i;
    if (s[i]) {
      s[i++] = '\0';
      while (isspace(s[i])) ++i;
    }
  }
}

int main(void) {
  char **rtn = split("  hello \t\t\n class,\nhow are you?");
  for (char **p = rtn; *p; ++p)
    printf("%s\n", *p);
  // Freeing the first element frees all strings (or does nothing if none)
  free(rtn[0]);
  free(rtn);
  return 0;
}

This omits checks for NULL returns from malloc and realloc. But they're easy to add.

You asked about the "other problems" with your code. I've fixed some here:

  • Use size_t to index arrays.
  • Grow the output array as needed. It's really not that hard...
  • Avoid unnecessary calls to malloc.
  • Avoid strlen when simple checks for the terminating NULL are easier.
  • Use the idiom FOO *p = malloc(sizeof *p); to allocate a FOO. It's less error prone than sizeof(FOO).

Comments

1

To correct your code change if (**output) to if (output).

I think your implementation is not memory efficient and could be more elegant.
You are allocating too much memory. I tried to explain in the code the upper bound for the size of output char pointers. If you want to have the exact size, you'll have to count words in the string. It's probably better to do it that way, but for the exercise I think we can go the easier way.

As to your code I can only say:

  • I did not see the end of string character '\0' anywhere, which is a bad sign
  • I did not see any string copy, which is also a bad sign
  • I did not see you make use of standard library, which often makes you reinvent the wheel

Please see below an improved implementation (I use the standard C89):

#include<stdio.h>
#include <string.h>
#include<stdlib.h>

char** my_str2vect(char* s) {
    // Takes a string 
    // Allocates a new vector (array of string ended by a NULL), 
    // Splits apart the input string x at each space character 
    // Returns the newly allocated array of strings
    // Any number of ' ','\t', and '\n's can separate words.
    // I.e. "hello \t\t\n class,\nhow are you?" -> {"hello", "class,", "how", "are","you?", NULL}

    int s_size = strlen(s);
    /*
     * size of output is 1 if string contains non delimiters only
     * size of output is 0 if string contains delimiters only
     * size of output is strlen / 2 if string contains ...
     * ...alternation of delimiter and non delimiter, and that is the max size
     * so we allocate that size (upper bound)
     */
    int max_output_size = (s_size / 2) + 1;
    char **output = (char **) malloc(sizeof (char *) * max_output_size);

    //initialize to NULL for convenience
    int k;
    for (k = 0; k < max_output_size; k++)
        output[k] = NULL;

    //work on a copy of s
    char *str = (char *) malloc(s_size + 1);
    strcpy(str, s);

    //pointer for token and delimiters
    char *ptr;
    char delimiter[] = "\n\t ";

    //initialize and create first token
    ptr = strtok(str, delimiter);

    //
    int i = 0;
    while (ptr != NULL) {
        //allocate memory and copy token
        output[i] = malloc(sizeof (char) * strlen(ptr) + 1);
        strcpy(output[i], ptr);
        //get next token
        ptr = strtok(NULL, delimiter);
        //increment
        i++;
    }

    return output;
}

int main(int argc, char *argv[]) {

    char **result = my_str2vect("hello \t\t\n class,\nhow are you?");

    int i;
    for (i = 0; result[i] != NULL; i++)
        printf("%s\n", result[i]);

    return 0;
}

3 Comments

don't forget to free.
char *str = (char *) malloc(s_size); you are not allocating enough space for the null terminator. s_size is not the size, but the length of the string.
Same for output[i] = malloc(sizeof (char) * strlen(ptr));. You should use strdup() to allocate and copy the string in one safe convenient step.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.