Weird behaviour with variable length arrays in struct in C

Question

I came across a concept which some people call a "Struct Hack" where we can declare a pointer variable inside a struct, like this:

struct myStruct{
    int data;
    int *array;
};

and later on when we allocate memory for a struct myStruct using malloc in our main() function, we can simultaneously allocate memory for our int *array pointer in same step, like this:

struct myStruct *p = malloc(sizeof(struct myStruct) + 100 * sizeof(int));

p->array = p+1;

instead of

struct myStruct *p = malloc(sizeof(struct myStruct));

p->array = malloc(100 * sizeof(int));

assuming we want an array of size 100.

The first option is said to be better since we would get a continuous chunk of memory and we can free that whole chunk with one call to free() versus 2 calls in the latter case.

Experimenting, I wrote this:

#include<stdio.h>
#include<stdlib.h>

struct myStruct{
    int i;
    int *array;
};

int main(){
    /* I ask for only 40 more bytes (10 * sizeof(int)) */

    struct myStruct *p = malloc(sizeof(struct myStruct) + 10 * sizeof(int)); 

    p->array = p+1; 

    /* I assign values way beyond the initial allocation*/
    for (int i = 0; i < 804; i++){
        p->array[i] = i;
    }

    /* printing*/
    for (int i = 0; i < 804; i++){
        printf("%d\n",p->array[i]);
    }

    return 0;
}

I am able to execute it without problems, without any segmentation faults. Looks weird to me.

I also came to know that C99 has a provision which says that instead of declaring an int *array inside a struct, we can do int array[] and I did this, using malloc() only for the struct, like

struct myStruct *p = malloc(sizeof(struct myStruct));

and initialising array[] like this

p->array[10] = 0; /* I hope this sets the array size to 10 
                    and also initialises array entries to 0 */

But then again this weirdness where I am able to access and assign array indices beyond the array size and also print the entries:

for(int i = 0; i < 296; i++){ // first loop
    p->array[i] = i;
}

for(int i = 0; i < 296; i++){ // second loop
    printf("%d\n",p->array[i]);
}

After printing p->array[i] till i = 296 it gives me a segmentation fault, but clearly it had no problems assigning beyond i = 9. (If I increment 'i' till 300 in the first for loop above, I immediately get a segmentation fault and the program doesn't print any values.)

Any clues about what's happening? Is it undefined behaviour or what?

EDIT: When I compiled the first snippet with the command

cc -Wall -g -std=c11 -O    struct3.c   -o struct3

I got this warning:

 warning: incompatible pointer types assigning to 'int *' from
  'struct str *' [-Wincompatible-pointer-types]
    p->array = p+1;

There is no variable length array in your code. What you do is called flxeible array member (FAM). How do you think malloc shall know how many elements you want this array to hold? — too honest for this site
– too honest for this site, Commented Nov 6, 2016 at 18:30
I will not give you how to do, because you can easily determine if you think about it a bit and don't concentrate on asking. Just that: You have all necessary information shown in your question already. — too honest for this site
– too honest for this site, Commented Nov 6, 2016 at 18:40
@WeatherVane: The array has a length of 1 entry. dereferencing beyond the borders is definitively UB. — too honest for this site
– too honest for this site, Commented Nov 6, 2016 at 18:42
@tectonicfury: Never ever cast just to silence the compiler if you don't understand all implications. If you think the UB will magically disapper by the cast: well, you are wrong! — too honest for this site
– too honest for this site, Commented Nov 6, 2016 at 19:07

anatolyg · Accepted Answer · 2016-11-06 19:03:54Z

3

Yes, what you see here is an example of undefined behavior.

Writing beyond the end of allocated array (aka buffer overflow) is a good example of undefined behavior: it will often appear to "work normally", while other times it will crash (e.g. "Segmentation fault").

A low-level explanation: there are control structures in memory that are situated some distance from your allocated objects. If your program does a big buffer overflow, there is more chance it will damage these control structures, while for more modest overflows it will damage some unused data (e.g. padding). In any case, however, buffer overflows invoke undefined behavior.

The "struct hack" in your first form also invokes undefined behavior (as indicated by the warning), but of a special kind - it's almost guaranteed that it would always work normally, in most compilers. However, it's still undefined behavior, so not recommended to use. In order to sanction its use, the C committee invented this "flexible array member" syntax (your second syntax), which is guaranteed to work.

Just to make it clear - assignment to an element of an array never allocates space for that element (not in C, at least). In C, when assigning to an element, it should already be allocated, even if the array is "flexible". Your code should know how much to allocate when it allocates memory. If you don't know how much to allocate, use one of the following techniques:

Allocate an upper bound: struct myStruct{ int data; int array[100]; // you will never need more than 100 numbers };
Use realloc
Use a linked list (or any other sophisticated data structure)

answered Nov 6, 2016 at 19:03

anatolyg

28.5k9 gold badges66 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tf3 Over a year ago

Thanks. The warning which I got upon compiling the first snippet just vanished when I cast p->array = (int *)(p + 1); so apparently its more due to UB than the warning.

anatolyg Over a year ago

Warnings never cause problems; they indicate that there are problems. Using a cast is a good way to silence a warning, when you "know what you are doing". If you have a C99 compiler, better use the syntax that doesn't require a casting - it's the safer code; also less ugly.

too honest for this site Over a year ago

To detail: A modern standard (currently 2011, aka C11) compliant compiler will also do. No need to use the old 1999 version (aka C99) of the standard.

sg7 · Accepted Answer · 2016-11-06 19:34:21Z

0

What you describe as a "Struct Hack" is indeed a hack. It is not worth IMO.

p->array = p+1;

will give you problems on many compilers which will demand explicit conversion:

p->array = (int *) (p+1);

I am able to execute it without problems, without any segmentation faults. Looks weird to me.

It is undefined behaviour. You are accessing memory on the heap and many compilers and operating system will not prevent you to do so. But it extremely bad practice to use it.

answered Nov 6, 2016 at 19:34

sg7

6,3052 gold badges35 silver badges41 bronze badges

1 Comment

tf3 Over a year ago

The issue was that I tried the other alternative (that I was aware of ) which did not involve pointer but used flexible array member. I was not very eager to use the hack but because the second alternative which I came across wasn't very helpful either, I was intrigued.

Collectives™ on Stack Overflow

Weird behaviour with variable length arrays in struct in C

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related