C, format file for data of HTTP response

Question

I have no experience with fscanf() and very little with functions for FILE. I have code that correctly determines if a client requested an existing file (using stat() and it also ensures it is not a directory). I will omit this part because it is working fine.

My goal is to send a string back to the client with a HTTP header (a string) and the correctly read data, which I would imagine has to become a string at some point to be concatenated with the header for sending back. I know that + is not valid C, but for simplicity I would like to send this: headerString+dataString.

The code below does seem to work for text files but not images. I was hoping that reading each character individually would solve the problem but it does not. When I point a browser (Firefox) at my server looking for an image it tells me "The image (the name of the image) cannot be displayed because it contains errors.".

This is the code that is supposed to read a file into httpData:

int i = 0;
FILE* file;
file = fopen(fullPath, "r");
if (file == NULL) errorMessageExit("Failed to open file");
while(!feof(file)) {
    fscanf(file, "%c", &httpData[i]);
    i++;
}
fclose(file);
printf("httpData = %s\n", httpData);

Edit: This is what I send:

char* httpResponse = malloc((strlen(httpHeader)+strlen(httpData)+1)*sizeof(char));
strcpy(httpResponse, httpHeader);
strcat(httpResponse, httpData);
printf("HTTP response = %s\n", httpResponse);

The data part produces ???? for the image but correct html for an html file.

To read a binary file you need: file = fopen(fullPath, "rb"); — dcaswell
– dcaswell, Commented Sep 11, 2013 at 5:49
I just tried it (and updated above), the image produces ???? — asimes
– asimes, Commented Sep 11, 2013 at 5:53
You need to worry about null bytes in the data; the %s format will stop at the first one. You should be using while (fscanf(file, "%c", &httpData[i]) == 1) i++; (using feof() in a loop is almost always a bug — and is specifically a bug when you don't check the return from fscanf() as in the code you show). You need to think about CRLF line endings; HTTP requires them for text files. You can't use strcpy() for the data; you'll need to use memmove() or memcpy(). — Jonathan Leffler
– Jonathan Leffler, Commented Sep 11, 2013 at 6:50
@Jonathan Leffler, could you please post the two parts of your comment as an answer. I think I understand the first part but I don't know how to go about using the second. — asimes
– asimes, Commented Sep 11, 2013 at 6:55

Jonathan Leffler · Accepted Answer · 2013-09-11 16:09:54Z

Images contain binary data. Any of the 256 distinct 8-bit patterns may appear in the image including, in particular, the null byte, 0x00 or '\0'. On some systems (notably Windows), you need to distinguish between text files and binary files, using the letter b in the standard I/O fopen() call (works fine on Unix as well as Windows). Given that binary data can contain null bytes, you can't use strcpy() et al to copy chunks of data around since the str*() functions stop copying at the first null byte. Therefore, you have to use the mem*() functions which take a start position and a length, or an equivalent.

Applied to your code, printing the binary httpData with %s won't work properly; the %s will stop at the first null byte. Since you have used stat() to verify the existence of the file, you also have a size for the file. Assuming you don't have to deal with dynamically changing files, that means you can allocate httpData to be the correct size. You can also pass the size to the reading code. This also means that the reading code can use fread() and the writing code can use fwrite(), saving on character-by-character I/O.

Thus, we might have a function:

int readHTTPData(const char *filename, size_t size, char *httpData)
{
    FILE *fp = fopen(filename, "rb");
    size_t n;
    if (fp == 0)
        return E_FILEOPEN;
    n = fread(httpData, size, 1, fp);
    fclose(fp);
    if (n != 1)
        return E_SHORTREAD;
    fputs("httpData = ", stdout);
    fwrite(httpData, size, 1, stdout);
    putchar('\n');
    return 0;
}

The function returns 0 on success, and some predefined (negative?) error numbers on failure. Since memory allocation is done before the routine is called, it is pretty simple:

Open the file; report error if that fails.
Read the file in a single operation.
Close the file.
Report error if the read did not get all the data that was expected.
Report on the data that was read (debugging only — and printing binary data to standard output raw is not the best idea in the world, but it parallels what the code in the question does).
Report on success.

In the original code, there is a loop:

int i = 0;
...
while(!feof(file)) {
    fscanf(file, "%c", &httpData[i]);
    i++;
}

This loop has a lot of problems:

You should not use feof() to test whether there is more data to read. It reports whether an EOF indication has been given, not whether it will be given.
Consequently, when the last character has been read, the feof() reports 'false', but the fscanf() tries to read the next (non-existent) character, adds it to the buffer (probably as a letter such as ÿ, y-umlaut, 0xFF, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS).
The code makes no check on how many characters have been read, so it has no protection against buffer overflow.
Using fscanf() to read a single character is a lot of overhead compared to getc().

Here's a more nearly correct version of the code, assuming that size is the number of bytes allocated to httpData.

int i = 0;
int c;

while ((c = getc(file)) != EOF && i < size)
    httpData[i++] = c;

You could check that you get EOF when you expect it. Note that the fread() code does the size checking inside the fread() function. Also, the way I wrote the arguments, it is an all-or-nothing proposition — either all size bytes are read or everything is treated as missing. If you want byte counts and are willing to tolerate or handle short reads, you can reverse the order of the size arguments. You could also check the return from fwrite() if you wanted to be sure it was all written, but people tend to be less careful about checking that output succeeded. (It is almost always crucial to check that you got the input you expected, though — don't skimp on input checking.)

At some point, for plain text data, you need to think about CRLF vs NL line endings. Text files handle that automatically; binary files do not. If the data to be transferred is image/png or something similar, you probably don't need to worry about this. If you're on Unix and dealing with text/plain, you may have to worry about CRLF line endings (but I'm not an expert on this — I've not done low-level HTTP stuff recently (not in this millennium), so the rules may have changed).

Collectives™ on Stack Overflow

C, format file for data of HTTP response

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related