c reading from file into an array and splitting

Question

i have a certain txt file(for instance - dic.txt) in which words appear in this order:

hello - ola - hiya \n
chips - fries - frenchfries \n

I need to read the contents of the file into an array of string arrays: for instance:

array[0]  : [hello,ola,hiya]
array[1]  : [chips,fries,frenchfries]

I was thinking of using strtok in order to split each line in the file into a string (after copying the entire file into a string and calculating the number of lines),but i could not figure how to split each line ("hello - ola - hiya \n") into the words,and storing each array into the array (an array of strings within an array).

I was considering using malloc in order to allocate memory for each line of words,and storing the pointer to the string's array into the array,but i will be glad to receive any suggestions.

I think you meant the \n (newline) character rather than the string /n — smac89
– smac89, Commented Nov 23, 2013 at 7:57
if you know the number of words in each line, you can use fscanf(file,"%s",buff) to read each word (it reads untill whitespace). — flyman
– flyman, Commented Nov 23, 2013 at 7:59
yes,i did mean \n,thanks.in case i do not know the number of words in each line,can i do anything else(maybe create a function that reads chars until white space? — user3005945
– user3005945, Commented Nov 23, 2013 at 8:14
you can use fscanf to read a "word" (chars untill whitespace) regardless the number of words in a line. but just by doing that you don't know when you get a word from another line. — flyman
– flyman, Commented Nov 23, 2013 at 8:22

M Oehm · Accepted Answer · 2013-11-23 09:22:26Z

2

The straightforward way to read lines from a file and then split them into tokens is to read lines with fgets and then use strtok to split each line into tokens:

int main(int argc, char *argv[])
{
    // Check for arguments and file pointer omitted
    FILE *f = fopen(argv[1], "r");

    for (;;) {
        char line[80];
        char *token;

        if (fgets(line, 80, f) == NULL) break;
        token = strtok(line, " -\n");
        while (token) {
            // Do something with token, for example:
            printf("'%s' ", token);
            token = strtok(NULL, " -\n");
        }
    }

    fclose(f);
    return 0;
}

This approach is fine as long as all the lines in your file are shorter than 80 characters. It works for variable numbers of tokens per line.

You have mentioned the issue of handling memory for the lines. The example above assumes that the memory handling is done by the data structure for each word. (It's not part of the example, which just prints the tokens.)

You can malloc memory for each line, which is more flexible than a rigid character limit per line, but you'll end up with a lot of allocations. The benefit is that your words don't need extra memory, they can just be pointers into the lines, but you'll have to take care of properly allocating memory for the lines - and freeing it afterwards.

If you read the whole text file to a contiguous chunk of memory, you're basically done with memory storage, as long as you keep that chunk "alive" as long as your words live:

char *slurp(const char *filename, int *psize)
{
    char *buffer;
    int size;
    FILE *f;

    f = fopen(filename, "r");
    if (f == NULL) return NULL;

    fseek(f, 0, SEEK_END);
    size = ftell(f);
    fseek(f, 0, SEEK_SET);

    buffer = malloc(size + 1);
    if (buffer) {
        if (fread(buffer, 1, size, f) < size) {
            free(buffer);
        } else {
            buffer[size] = '\0';
            if (psize) *psize = size;
        }
    }

    fclose(f);
    return buffer;
}

With that chunk of memory, you can first look for lines by looking for the next newline, and then use strtok as above:

int main(int argc, char *argv[])
{
    char *buffer;    // contiguous memory chunk
    char *next;      // pointer to next line or NULL for last line

    buffer = slurp(argv[1], NULL);    
    if (buffer == NULL) return 0;

    next = buffer;
    while (next) {
        char *token;
        char *p = next;

        // Find beginning of the next line, 
        // i.e. the char after the next newline
        next = strchr(p, '\n');
        if (next) {
            *next = '\0';      // Null-terminate line
            next = next + 1;   // Advance past newline
        }

        token = strtok(p, " -\n");
        while (token) {
            // Do something with token, for example:
            printf("'%s' ", token);
            token = strtok(NULL, " -\n");
        }        
    }

    free(buffer);             // ... and invalidate your words        
    return 0;
}

If you use fscan, you always copy the found tokens to a temporary buffer and when you store them away in your dictionary structure, you have to copy them again with strcpy. That's a lot of copying. Here, you read and allocate once and then work with pointers into the chunk. strtok null-terminates the tokens, so your chunk is a chain of C strings.

Reading the wholem file into memory is usually not a good solution, but in this case, where the file basically is the data, it makes sense.

(Note: All this discussion about memory does not affect the memory needed for your dictionary structure, the nodes in trees and lined lists or whatever. It is just about storing the strings proper.)

answered Nov 23, 2013 at 9:22

M Oehm

29.2k3 gold badges36 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

user3005945 Over a year ago

i do not fully understand all the usages(slurp and additional things i am not familiar with).i would like to write the code entirely myself(for learning purposes).so,basically,can i simply count the number of lines in the file,create an array of string array pointers,and for each line,copy the line into a string while counting the number of words.afterwards,allocate memory for each line and set the pointer in the array to this line?if so,how do i create such a pointer array?(i will be glad to simply get the names and general position other than a fixated code)

M Oehm Over a year ago

If you are learning, I suggest that you go with the simple framework of fgets and strtok in the first example. These are standard methods and well documented. Forget about the slurping. If you want to store pointers to mallocated memory per line, you need a awy to store these pointers. Your idea to count the lines first, allocate line pointers and then allocate memory to these pointers in a second pass is one possibility. (There are other options, of course, but I seem to have confused you enough already. Sorry for that. Go with your original idea.)

user3005945 Over a year ago

Okay,thanks alot.in this case,how do i create an array of array pointers(an array in which each cell is a pointer to an array of strings)?

M Oehm Over a year ago

It works in two steps: Declare a pointer to pointer to char: char **lines, then malloc as usual: lines = malloc(n * sizeof(*lines)). As you process the lines, allocate again: lines[i] = malloc(len). You now have a "ragged" array of lines. (Ragged, because the lines have different lengths.) Then, lines[i][j] gives you the j-th char in the i-th line. Finally, freeing works the other way round: First, free all lines, then free the array of lines. Note that sizeof(*lines) gives you the size of a pointer to char and is equivalent to sizeof(char *).

user3005945 Over a year ago

i appologize for the ignorance,but what do you mean by pointer a pointer to char?i need an array of pointers,each pointer points to an array of char arrays(aka an array of strings).

|

flyman · Accepted Answer · 2013-11-23 08:41:26Z

1

using fgets:

int eol(int c, FILE *stream) //given a char and the file, check if eol included
{
    if (c == '\n')
        return 1;
    if (c == '\r') {
        if ((c = getc(stream)) != '\n')
            ungetc(c, stream);
        return 1;
    }
    return 0;
}

int charsNumInLine(FILE *stream)
{
    int position = ftell(stream);
    int c, num_of_chars=0;

    while ((c = getc(stream)) != EOF && !eol(c, stream))
        num_of_chars++;

    fseek(stream,position,SEEK_SET); //get file pointer to where it was before this function call
    return num_of_chars; 
}

void main()
{
    //...
    char *buffer;
    int size;
    while()
    {
        size=charsNumInLine(stream);
        buffer = (char*)malloc( size*sizeof(char) );
        fgets(buffer,sizeof(buffer),stream);
        if (feof(stream) || ferror(stream) )
            break;

        // use strtok to separate words...
    }
    //...
}

another way is to use fscanf(file,"%s",buff)to read words and then use the above function eol to see when we get to a newline.

answered Nov 23, 2013 at 8:41

flyman

2104 silver badges15 bronze badges

2 Comments

user3005945 Over a year ago

Thanks.and as for storing,can i create an array of string array pointers?

flyman Over a year ago

yes, you can create an array of char*. and fscanf(file,"%s",buff) would read a word from file and store it in buff.

Collectives™ on Stack Overflow

c reading from file into an array and splitting

2 Answers 2

13 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

13 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related