0

I seem to be losing the reference to my pointers here. I dont know why but I suspect its the pointer returned by fgets that messes this up. I was told a good way to read words from a file was to get the line then separate the words with strok, but how can I do this if my pointers inside words[i] keep dissapearing.

text

Natural Reader is
john make tame

Result Im getting.

array[0] = john
array[1] = e
array[2] =
array[3] = john
array[4] = make
array[5] = tame
int main(int argc, char *argv[]) {
   FILE *file = fopen(argv[1], "r");
   int ch;
   int count = 0;
   while ((ch = fgetc(file)) != EOF){
      if (ch == '\n' || ch == ' ')
         count++;
   }
   fseek(file, 0, SEEK_END);
   size_t size = ftell(file);
   fseek(file, 0, SEEK_SET);
   char** words = calloc(count, size * sizeof(char*) +1 );
   int i = 0;
   int x = 0;
   char ligne [250];

   while (fgets(ligne, 80, file)) {
      char* word;
      word = strtok(ligne, " ,.-\n");
      while (word != NULL) {
         for (i = 0; i < 3; i++) {
            words[x] = word;
            word = strtok(NULL, " ,.-\n");
            x++;
         }
      }
   }
   for (i = 0; i < count; ++i)
      if (words[i] != 0){
         printf("array[%d] = %s\n", i, words[i]);
      }
   free(words);
   fclose(file);
   return 0;
}
3
  • There are a lot of inconsistencies in your code: For example, first you count only spaces and newlines, but then you tokenize on spaces, newlines, commas, dots and dashes. What is the inner for loop for? I think while (word != NULL) should be enough. Commented Mar 17, 2022 at 6:30
  • Anway, your arrays look corrupted, because you store the token pointers returned from strtok, but these pointers are pointers into ligne. So the first word is &ligne[0]. When you read the second line, that string is overwritten with "John ...". If you want permanent strings, you should make a copy. (Use strdup, which is not standard, but widely available.) Commented Mar 17, 2022 at 6:34
  • (But since you don't distinguish between newlines and other kinds of space, you could just read the whole file as a single string and tokenize that. The tokens will be good as long as the long string exists.) Commented Mar 17, 2022 at 6:35

2 Answers 2

1

strtok does not allocate any memory, it returns a pointer to a delimited string in the buffer.

therefore you need to allocate memory for the result if you want to keep the word between loop iterations

e.g.

word = strdup(strtok(ligne, " ,.-\n"));
Sign up to request clarification or add additional context in comments.

Comments

1

You could also hanle this by using a unique ligne for each line read, so make it an array of strings like so:

char ligne[20][80]; // no need to make the string 250 since fgets limits it to 80

Then your while loop changes to:

int lno = 0;
while (fgets(ligne[lno], 80, file)) {
    char *word;
    word = strtok(ligne[lno], " ,.-\n");
    while (word != NULL) {
        words[x++] = word;
        word = strtok(NULL, " ,.-\n");
    }
    lno++;
}

Adjust the first subscript as needed for the maximum size of the file, or dynamically allocate the line buffer during each iteration if you don't want such a low limit. You could also use getline instead of fgets, if your implementation supports it; it can handle the allocation for, though you then need to free the blocks when you are done.

If you are processing real-world prose, you might want to include other delimiters in your list, like colon, semicolon, exclamation point, and question mark.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.