1

I want split_str to be able to take, for example, "bob is great" and return ["bob", "is", "great"].

More precisely: foo = split_str("bob is great", " ") allocates["bob", "is", "great"] in foo (thus becoming an array of 3 strings which were all separated by a space, as specified... but I would like this to be generalized to not only generating arrays of 3 strings, but of any amount of strings if possible).

char* split_str(char*, char[]);

char* split_str(char* str, char delim[]) {
    char copied_input[strlen(str)];
    strncpy (copied_input, str, strlen(str)+1);

    char* result[strlen(str)+1];  // add 1 for the "NULL" char

    int tmp = 0;  // preparing iterator
    result[tmp] = strtok (copied_input, delim);  // obtaining first word

    while (result[tmp] != NULL) {  // to populate the whole array with each words separately
        result[++tmp] = strtok (NULL, delim);
    }

    return result;
}

This represents more or less the kind of execution I'm trying to achieve:

int main (void)
{
    int MAX_AMNT = 50;  // maximum amount of args to parse
    char *bar[MAX_AMNT];
    bar = split_str("bob is great", " ");
    tmp = 0;
    while (bar[tmp] != NULL) {
        fprintf (stdout, "Repeating, from array index %d: %s\n", tmp, bar[tmp++]);
    }
}

I'm very new to C so I might be wrong in the way I've phrased my question (pointers and arrays, and pointers of arrays, and etc. is a bit of a headache still for me).

I know my return signature is wrong for my function, and also that it's probably wrong to return a local variable (result), but I'm lost as of how to proceed from here. I tried changing it to a void function and adding a third argument as a variable that would be populated (as result is), but I keep getting errors.

6
  • 1
    result is local variable and it will be vanished once control exits function. Commented Jan 18, 2019 at 20:52
  • You are going to want to learn to use malloc and free Commented Jan 18, 2019 at 20:52
  • @kiranBiradar that is part of my concern, as explained in the last paragraph of my question. Commented Jan 18, 2019 at 20:52
  • 1
    char **result = malloc((strlen(str)+1) * sizeof(char *)); Commented Jan 18, 2019 at 20:55
  • @ChristianGibbons I come from a Java/Python background and I'm having trouble understanding how I'm supposed to manipulate arrays of undetermined sizes. I also have trouble understanding the difference between char *bar[] = malloc(50); and char *bar[50];. Commented Jan 18, 2019 at 20:55

2 Answers 2

4

A solution is :

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char ** split(const char * str, const char * delim)
{
  /* count words */
  char * s = strdup(str);

  if (strtok(s, delim) == 0)
    /* no word */
    return NULL;

  int nw = 1;

  while (strtok(NULL, delim) != 0)
    nw += 1;

  strcpy(s, str); /* restore initial string modified by strtok */

  /* split */
  char ** v = malloc((nw + 1) * sizeof(char *));
  int i;

  v[0] = strdup(strtok(s, delim));

  for (i = 1; i != nw; ++i)
    v[i] = strdup(strtok(NULL, delim));

  v[i] = NULL; /* end mark */

  free(s);

  return v;
}

int main()
{
  char ** v = split("bob is  great", " ");

  for (int i = 0; v[i] != NULL; ++i) {
    puts(v[i]);
    free(v[i]);
  }

  free(v);
  return 0;
}

As you see I add a null pointer at the end of the vector as a mark, but it can be changed easily to return the number of words etc

Execution :

bob
is
great

A second solution taking into account the remarks of alk :

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char ** split(const char * str, const char * delim)
{
  /* count words */
  char * s = strdup(str);

  if ((s == NULL) /* out of memory */
      || (strtok(s, delim) == 0)) /* no word */
    return NULL;

  size_t nw = 1;

  while (strtok(NULL, delim) != 0)
    nw += 1;

  strcpy(s, str); /* restore initial string modified by strtok */

  /* split */
  char ** v = malloc((nw + 1) * sizeof(char *));

  if (v == NULL)
    /* out of memory */
    return NULL;

  if ((v[0] = strdup(strtok(s, delim))) == 0) {
    /* out of memory */
    free(v);
    return NULL;
  }

  size_t i;

  for (i = 1; i != nw; ++i) {
    if ((v[i] = strdup(strtok(NULL, delim))) == NULL) {
      /* out of memory, free previous allocs */
      while (i-- != 0)
        free(v[i]);
      free(v);
      return NULL;
    }
  }

  v[i] = NULL; /* end mark */

  free(s);

  return v;
}

int main()
{
  const char * s = "bob is still great";
  char ** v = split(s, " ");

  if (v == NULL)
    puts("no words of not enough memory");
  else {
    for (int i = 0; v[i] != NULL; ++i) {
      puts(v[i]);
      free(v[i]);
    }

    free(v);
  }
  return 0;
}

When out of memory the return value is NULL ( in a previous version it was the string to split), of course there are other ways to signal that easily


Execution under valgrind :

==5078== Memcheck, a memory error detector
==5078== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5078== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5078== Command: ./a.out
==5078== 
bob
is
still
great
==5078== 
==5078== HEAP SUMMARY:
==5078==     in use at exit: 0 bytes in 0 blocks
==5078==   total heap usage: 7 allocs, 7 frees, 1,082 bytes allocated
==5078== 
==5078== All heap blocks were freed -- no leaks are possible
==5078== 
==5078== For counts of detected and suppressed errors, rerun with: -v
==5078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 3)
Sign up to request clarification or add additional context in comments.

23 Comments

Nice approach avoiding repetitive reallocation.
Nitpick: All those counters should be size_t not int. Also error checking is missing completely.
@alk I added an other solution, but to be frank for me to verify these is enough memory in that kind of application is mainly a loss of time, the day you have not enough memory be sure you will have other problems ^^
@alk except is the C norm explicitly says an address is never equals (whatever the conversion signed/unsigned) to -1 it is wrong to use -1 as an invalid address
@payne I work on a copy to not modify the original, strtok modify its first parameter, and I use it 2 times. The malloc((strlen(str)+1) * sizeof(char *)); is a poor way and allocates too much memory, the lazy way was to do malloc((strlen(str)+1) / 2));supposing the worst case where each word has only one leter, but again this is a poor way. I don't like poor programming ;-) On a 32 bits machine a pointer uses 32bits.
|
2

An approach to split a string of unknown number of words and make them available in return from a function would require a function that returns a pointer-to-pointer-to-char. This allows a true dynamic approach where you allocate some initial number of pointers (say 2, 4, 8, etc..) make a single pass through your string using strtok keeping track of the number of pointers used, allocating storage fro each token (word) as you go and when the number of pointers used equals the number allocated, you simply realloc storage for additional pointers and keep going.

A short example implementing the function splitstring() that does that could look similar to the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NPTR    8   /* initial number of pointers to allocate */
#define MAXD   32   /* maximum no chars for delimiter */
#define MAXC 1024   /* maximum no chars for user input */

char **splitstring (const char *str, const char *delim, size_t *nwords)
{
    size_t nptr = NPTR,             /* initial pointers */
        slen = strlen (str);        /* length of str */
    char **strings = malloc (nptr * sizeof *strings),   /* alloc pointers */
        *cpy = malloc (slen + 1),   /* alloc for copy of str */
        *p = cpy;                   /* pointer to cpy */

    *nwords = 0;                    /* zero nwords */

    if (!strings) {     /* validate allocation of strings */
        perror ("malloc-strings");
        free (cpy);
        return NULL;
    }

    if (!cpy) {         /* validate allocation of cpy */
        perror ("malloc-cpy");
        free (strings);
        return NULL;
    }
    memcpy (cpy, str, slen + 1);    /* copy str to cpy */

    /* split cpy into tokens */
    for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
        size_t len;             /* length of token */
        if (*nwords == nptr) {  /* all pointers used/realloc needed? */
            void *tmp = realloc (strings, 2 * nptr * sizeof *strings);
            if (!tmp) {         /* validate reallocation */
                perror ("realloc-strings");
                if (*nwords)    /* if words stored, return strings */
                    return strings;
                else {          /* no words, free pointers, return NULL */
                    free (strings);
                    return NULL;
                }
            }
            strings = tmp;      /* assign new block to strings */
            nptr *= 2;          /* update number of allocate pointers */
        }
        len = strlen (p);       /* get token length */
        strings[*nwords] = malloc (len + 1);    /* allocate storage */
        if (!strings[*nwords]) {                /* validate allocation */
            perror ("malloc-strings[*nwords]");
            break;
        }
        memcpy (strings[(*nwords)++], p, len + 1);  /* copy to strings */
    }
    free (cpy);     /* free storage of cpy of str */

    if (*nwords)    /* if words found */
        return strings;

    free (strings); /* no strings found, free pointers */
    return NULL;
}

int main (void) {

    char **strings = NULL, 
        string[MAXC],
        delim[MAXD];
    size_t nwords = 0;

    fputs ("enter string    : ", stdout);
    if (!fgets (string, MAXC, stdin)) {
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    fputs ("enter delimiters: ", stdout);
    if (!fgets (delim, MAXD, stdin)) {
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    if ((strings = splitstring (string, delim, &nwords))) {
        for (size_t i = 0; i < nwords; i++) {
            printf (" word[%2zu]: %s\n", i, strings[i]);
            free (strings[i]);
        }
        free (strings);
    }
    else
        fputs ("error: no delimiter found\n", stderr);
}

(note: the word count nwords is passed as a pointer to the splitstring() function to allow the number of words to be updated within the function and made available back in the calling function, while returning a pointer-to-pointer-to-char from the function itself)

Example Use/Output

$ ./bin/stringsplitdelim
enter string    : my dog has fleas and my cat has none and snakes don't have fleas
enter delimiters:
 word[ 0]: my
 word[ 1]: dog
 word[ 2]: has
 word[ 3]: fleas
 word[ 4]: and
 word[ 5]: my
 word[ 6]: cat
 word[ 7]: has
 word[ 8]: none
 word[ 9]: and
 word[10]: snakes
 word[11]: don't
 word[12]: have
 word[13]: fleas

(note: a ' ' (space) was entered as the delimiter above resulting in delim containing " \n" (exactly what you want) by virtue of having used the line-oriented input function fgets for user input)

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/stringsplitdelim
==12635== Memcheck, a memory error detector
==12635== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12635== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==12635== Command: ./bin/stringsplitdelim
==12635==
enter string    : my dog has fleas and my cat has none and snakes don't have fleas
enter delimiters:
 word[ 0]: my
 word[ 1]: dog
 word[ 2]: has
 word[ 3]: fleas
 word[ 4]: and
 word[ 5]: my
 word[ 6]: cat
 word[ 7]: has
 word[ 8]: none
 word[ 9]: and
 word[10]: snakes
 word[11]: don't
 word[12]: have
 word[13]: fleas
==12635==
==12635== HEAP SUMMARY:
==12635==     in use at exit: 0 bytes in 0 blocks
==12635==   total heap usage: 17 allocs, 17 frees, 323 bytes allocated
==12635==
==12635== All heap blocks were freed -- no leaks are possible
==12635==
==12635== For counts of detected and suppressed errors, rerun with: -v
==12635== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have further questions.

4 Comments

Thank you for your input. It seems more flexible than the other answer provided since, among other things, it allows the user to input a delimiter. I've also appreciated your Memory Use/Error Check section. However, I've accepted the other answer as its code seemed more readable and restricted to my domain of use.
Hey David. I printed out len and size of strings[*nwords] in split_strings function. len value is as is splited string length. however sizeof return the same value as 8. I read open group malloc. somehow the reference still too hard for me now to understand. Can you drop some hints.
@jian -- what is sizeof (a_pointer)?
Keep up the good work. Those little nuggets are the learning that slowly gets laid down that turns you into a quality programmer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.