3

My program has the following requirements: If a command line argument is given, interpret it as a file name and read the input from that file. Otherwise, read input from stdin instead. As I am going to need the input later, I want to save it into an array. And because any non-ASCII characters are to be ignored, I decided to process the input character by character. This is my code:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define MAX_WORDS 999
#define MAX_WORD_LENGTH 50

typedef struct Data{
    char words[MAX_WORDS][MAX_WORD_LENGTH+1];
    int numwords;
} Data;

Data* data;

int main(int argc, char** argv){
    data= malloc(sizeof data);

    FILE* infile;
    if(argc<=1){//if no arguments, read words from stdin
        infile= stdin;
    }else if(argc==2){//read words from file
        infile= fopen(argv[1], "r");
        if(infile==NULL){
            printf("failed to open word file");
            return 1;
        }
    }

    char c;
    int wordindex= 0;
    int charindex= 0;
    while(1){//read input character by character, only accepting ASCII characters
        c= fgetc(infile);
        if(c=='\n' || c==EOF){
            data->words[wordindex][charindex]= '\0';
            wordindex++;
            charindex= 0;
            if(wordindex>=MAX_WORDS || c==EOF)
                break;
            continue;
        }
        if(!isascii(c))
            continue;

        data->words[wordindex][charindex]= toupper(c);
        charindex++;
        if(charindex>=MAX_WORD_LENGTH){
            wordindex++;
            charindex= 0;
            if(wordindex>=MAX_WORDS)
                break;
        }
    }
    if(argc==2) fclose(infile);

    data->numwords= wordindex-1;


    //check if everything worked as intended
    printf("==================\n%d word(s) read:\n", data->numwords);
    for (int i = 0; i < data->numwords; i++)
        printf("%d %s\n", (int)strlen(data->words[i]), data->words[i]);
}

Everything works fine if I enter the input through stdin, but if I attempt to read the input from a text file, the program segfaults. It seems to work if the text file contains only one line of text, but if there are two or more then it crashes. I'm a beginner in C and I don't see any difference between reading from stdin or a file, so I have no idea why this is happening. Can somebody enlighten me?

10
  • Unrelated to your problem, but why allocate the single structure dynamically if you're just allocating one? Or generally allocate a number of structured decided at compile-time? Why not use arrays of the structure? If you put the array as a global variable then it won't take up stack-space if that's what you're worried about. Commented Jun 4, 2016 at 18:13
  • 2
    Also note that fgetc doesn't return a char. This is actually very important. Commented Jun 4, 2016 at 18:15
  • @JoachimPileborg I didn't have a particular reason to use malloc there, it's just because I'm a noob :) Anyway, I don't understand why it matters if fgetc returns an int, I just checked and the comparison c==EOF works just fine as it is. Commented Jun 4, 2016 at 18:26
  • 2
    The problem with fgetc and EOF is that EOF is -1, and like all integer literals -1 is an int, and its value (on a two-complement 32-bit machine) is 0xffffffff. A char is usually a byte, eight bits, and then negative one has the value 0xff, or as a 32-bit value 0x000000ff. And since char might be unsigned (the signedness of char is not defined by the standard) you might compare 0x000000ff to 0xffffffff which will not yield the correct result. Commented Jun 4, 2016 at 18:42
  • 1
    @JoachimPileborg That's a great explanation, thank you. I'll go and fix it immediately. Commented Jun 4, 2016 at 18:45

1 Answer 1

2

This

Data* data;
data= malloc(sizeof data);

allocates bytes to suite the size of data, with data being "just" a pointer to Data, not Data itself. A pointer is 4/8 bytes depending whether on a 32/64 bit platform.

Allocating to few memory here leads to writing to invalid memory soon and with this invoking the infamous Undefined Behaviour. From this moment on anything can happen ranging from crash to nothing.

If you want the number of bytes to hold Data you want to allocate like this

data = malloc(sizeof (Data));

of even better like this

data = malloc(sizeof *data);

Also one should test the outcome of relevant system call, also malloc() could fail:

if (NULL == data)
{
  perror("malloc() failed");
  exit(EXIT_FAILURE);
}
Sign up to request clarification or add additional context in comments.

5 Comments

You're right, changing the code to sizeof *data worked. I'd like to understand why my code didn't crash when reading from stdin though, it would be great if you could explain that.
@rawing: I just added some words on undefined behaviour.
Oh, just a coincidence then, more or less? Guess I was reading too much into it.
@rawing: Just "bad" luck I'd say. Have a look at Valgrind to test for such issues.
In C running code isn't necessarily correct code ... @rawing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.