3

I am trying to create a function that takes a string, splits it into words and return an array with the words in it. I am not allowed to use any pre-made functions other than malloc within the splitting function. Finally I have to set my function in this form char **ft_split_whitespaces(char *str) My current output looks like that:


    d this is me
    s is me
    s me
    r

Expected output:


    Hello
    World
    This
    Is
    Me

my full code is in the following codes:


    #include <stdio.h>
    #include <stdlib.h>
    
    int     count_words(char *str)
    {
        int i; 
        int word;
        
        i = 0;
        word = 1;
        while(str[i]!='\0')
        {
            if(str[i]==' ' || str[i]=='\n' || str[i]=='\t' 
            || str[i]=='\f' || str[i]=='\r' || str[i]=='\v')
                word++;
            i++;
        }
        return (word);
    }
    
    char    **ft_split_whitespaces(char *str)
    {
        int index;
        int size;
        int index2;
        char **arr;
        
        index = 0;
        index2 = 0;
        size = count_words(str);
        arr = (char **)malloc(size * sizeof(char));
        if (arr == NULL)
            return ((char **)NULL);
        while (str[index])
        {
            if(str[index] == ' ')
            {
                index++;
                value++;
                index2++;
            }
            else
                *(arr+index2) = (char*) malloc(index * sizeof(char));
                *(arr+index2) = &str[index];    
            index++;
        }
        **arr = '\0';
        return (arr);
    }
    
    int main()
    {
        char a[] = "Hello World This Is Me";
        char **arr;
        int i;
        int ctr = count_words(a);
        arr = ft_split_whitespaces(a);
        
        for(i=0;i < ctr;i++)
            printf("%s\n",arr[i]);
        return 0;
    }

4
  • This is a good question. It's discussed in these course notes. See also this question. Commented Jul 15, 2021 at 14:46
  • How is the code that calls ft_split_whitespaces supposed to know how many entries the returned array has? Is there supposed to be a NULL on the end? (If so, don't forget to allocate space for that pointer!) Commented Jul 15, 2021 at 15:20
  • Is ft_split_whitespaces() allowed to modify the original string (e.g. insert zero terminators at the end of words and return an array of pointers to the start of words, to avoid allocating memory for each word)? You could possibly also use recursion to avoid count_words(). Commented Jul 15, 2021 at 18:58
  • You shouldn't cast the return value of malloc if you are compiling with a C compiler: stackoverflow.com/questions/605845/…. Commented Jul 15, 2021 at 22:02

1 Answer 1

3

You have quite a few errors in your program:

  1. arr = (char **)malloc(size * sizeof(char)); is not right since arr is of type char**. You should use sizeof(char*) or better (sizeof(*arr)) since sizeof(char) is usually not equal to sizeof(char*) for modern systems.

  2. You don't have braces {} around your else statement in ft_split_whitespaces which you probably intended. So your conditional logic breaks.

  3. You are allocating a new char[] for every non--whitespace character in the while loop. You should only allocate one for every new word and then just fill in the characters in that array.

  4. *(arr+index2) = &str[index];This doesn't do what you think it does. It just points the string at *(arr+index2) to str offset by index. You either need to copy each character individually or do a memcpy() (which you probably can't use in the question). This explains why your answer just provides offsets into the main string and not the actual tokens.

  5. **arr = '\0'; You will lose whatever you store in the 0th index of arr. You need to individually append a \0 to each string in arr.

  6. *(arr+index2) = (char*) malloc(index * sizeof(char)); You will end up allocating progressively increasing size of char arrays at because you are using index for the count of characters, which keeps on increasing. You need to figure out the correct length of each token in the string and allocate appropriately.

Also why *(arr + index2)? Why not use the much easier to read arr[index2]?


Further clarifications:

Consider str = "abc de"

You'll start with

*(arr + 0) = (char*) malloc(0 * sizeof(char));
//ptr from malloc(0) shouldn't be dereferenced and is mostly pointless (no pun), probably NULL
*(arr + 0) = &str[0]; 

Here str[0] = 'a' and is a location somehwhere in memory, so on doing &str[0], you'll store that address in *(arr + 0)

Now in the next iteration, you'll have

*(arr + 0) = (char*) malloc(1 * sizeof(char)); 
*(arr + 0) = &str[1]; 

This time you replace the earlier malloc'd array at the same index2 again with a different address. In the next iterations *(arr + 0) = (char*) malloc(2 * sizeof(char));. You end up resetting the same *(arr + index2) position till you encounter a whitespace after which you do the same thing again for the next word. So don't allocate arrays for every index value but only if and when required. Also, this shows that you'll keep on increasing the size passed to malloc with the increasing value of index which is what #6 indicated.

Coming to &str[index].

You are setting (arr + index2) i.e. a char* (pointer to char) to another char*. In C, setting a pointer to another pointer doesn't copy the contents of the second pointer to the first, but only makes both of them point to the same memory location. So when you set something like *(arr + 1) = &str[4], it's just a pointer into the original string at index = 4. If you try to print this *(arr + 1) you'll just get a substring from index = 4 to the end of the string, not the word you're trying to obtain.

**arr = '\0' is just dereferencing the pointer at *arr and setting its value to \0. So imagine if you had *(arr + 0) = "hello\0", you'll set it to "\0ello\0". If you're ever iterating over this string, you'll never end up traversing beyond the first '\0' character. Hence you lose whatever *arr was earlier pointing to.

Also, *(arr + i) and arr[i] are exactly equivalent and make for much better readability. It better conveys that arr is an array and arr[i] is dereferencing the ith element.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you alot for your comment, but I have to admit starting from step 3 onwards I really didnt understand how I should tackle those problems, Have to say I am pretty new hands on with pointers and malloc, so if possible to clarify them more it would be great! Thanks anyway
@Alsakka I've added further explanation. Fix these and you should have a working solution. I'm not going to write the exact answer since that isn't what SO is for but I hope you're able to see and work the solution for yourself. You're on the right track. Good luck!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.