2

Suppose we want to construct an array of structs, where the definition of the struct cannot be known at compile time.

Here is a SSCCE:

#include <stdlib.h>

int main(int argc, char *argv[]){
    if (argc < 3) return 1;
    int n = atoi(argv[1]);
    int k = atoi(argv[2]);
    if ((n < 1) || (k < 1)) return 2;

    // define struct dynamically
    typedef struct{
        int a[n];
        short b[k];
    }data_point_t;

    int m = 10;
    // construct array of `m` elements
    data_point_t *p = malloc(sizeof(data_point_t)*m);

    // do something with the array
    for(int i = 0; i < m; ++i) p[i].a[0] = p[i].b[0] = i; 
    free(p);
    return 0;
}

This works fine with gcc (C99), however it doesn't with clang, which yields:

error: fields must have a constant size: 
       'variable length array in structure' extension will never be supported

So I'm obviously relying on a gcc extension. My question is, how to deal with this kind of problem in standard conform C99? (Bonus question: how to do this in C++11?)

Note: Performance matters, when iterating p there should be aligned memory access. Dereferencing pointers in the loop, yielding random memory access, is not an option.

15
  • You do realize that an array lookup is basically the same as pointer dereferencing, and that GCC can most likely optimizes that sort of stuff? Have you profiled your code to see if pointer dereferencing is actually slowing it down? This claim seems very dubious. Commented Apr 8, 2014 at 13:41
  • @ColonelThirtyTwo The pointer is actually being dereferenced p[i]. Commented Apr 8, 2014 at 13:42
  • @Joachim Pileborg: Since it is a C program, its a .c file. Commented Apr 8, 2014 at 13:42
  • 2
    why wouldn't you just malloc a two-dimensional array of ints here...? Commented Apr 8, 2014 at 13:43
  • 1
    I love the clarity of Clang's error message here! Preventing people from requesting this feature, even. :) Commented Apr 8, 2014 at 13:46

3 Answers 3

2

I think your best bet is to drop the idea of wrapping the array in a structure, bite the bullet and allocate a 2D array yourself.

This will mean that you need to do explicit indexing, but that would have to happen under the hood anyway.

When it comes to alignment, if you're going to visit every n array elements in each of the m arrays, it probably doesn't matter, it's better to make them compact to maximize use of cache.

Something like:

int *array = malloc(m * n * sizeof *array);

Then to index, just do:

// do something with the array
for(int i = 0; i < m; ++i)
{
  for(int j = 0; j < n; ++j)
    array[i * n + j] = j;
}

If you're very worried about that multiplication, use a temporary pointer. After profiling it, of course.

Sometimes you see this done with a helper macro to do the indexing:

#define INDEX(a, n, i, j)  (a)[(i) * (n) + (j)]

then you can write the final line as:

INDEX(array, n, i, j) = j;

it's a bit clumsy since the n needs to go in there all the time, of course.

Sign up to request clarification or add additional context in comments.

1 Comment

Guess what, thats exactly what I have done so far. However, it can give extremely ugly and error prone code, when doing all the pointer arithmetic by hand. Especially, since the struct in the example is a bit simplified. I thought there is something more elegant out there. Porting this idea to c++ seems to be even more weird.
1

First of all, it only makes sense to wrap the array inside a struct in the case there are other struct members present. If there are no other struct members, simply allocate an array.

If there are other struct members, then use a flexible array member to achieve what you want. Flexible array members are well-defined in the C standard and will work on every C99 compiler.

// define struct dynamically
typedef struct{

    type_t the_reason_you_need_this_to_be_a_struct_and_not_an_array;
    int a[]; // flexible array member
}data_point_t;

// construct array of `m` elements
int m = 10;
size_t obj_size = sizeof(data_point_t) + n*sizeof(int);
data_point_t *p = malloc(m * obj_size);

9 Comments

Ok, this is kind of a solution, but one still needs to perform pointer arithmetic by hand, since p[1] will not point to ((char *)p) + obj_size but to ((char *)p) + sizeof(data_point_t). Quite dangerous, in my opinion. (But at least elements of the struct can accessed correctly +1)
Ok, what would you do if we have two arrays in the struct say, int a[n]; short b[m]. Then flexible array members are not an option anymore :-( ...manual pointer arithmetic seems to be the only choice in some cases.
@dastrobu As I wrote, the struct only makes sense if there's something else that needs to be inside the struct. Flexible array members are used in cases where you have a "data header" followed by trailing data. If that's not the case either, then use plain pointers and allocate data dynamically. You need to specify what you need to do, before you try to come up with a solution for it... If you have a solution but are not sure what problem you are trying to solve, then you are doing it wrong.
I updated the question to refer to the more general case, where there are two (or more) arrays.
@dastrobu So basically you are asking how to allocate an array dynamically in C. Simply declare the struct members as pointers, not as arrays with fixed size. And then malloc. There's nothing more to it than that.
|
0

In C++ you can of course use pointers much like you do now, but for a "proper" C++ solution the only viable solution is to use std::vector:

struct data_point_t
{
    explicit data_point_t(const size_t sz)
        : a(sz)  // Construct the vector `a` with `sz` entries,
                 // each element will be zero initialized (`int()`)
    {}

    std::vector<int> a;
};

int main(int argc, char *argv[]){
    // Read `n`...
    int n = 10;  // Just example

    // Read `m`...
    int m = 10;  // Just example

    // Construct vector of `m` elements
    std::vector<data_point_t> p(m, data_point_t(n));

    // Here the vector `p` contains `m` elements, where each instance
    // have been initialized with a vector `a` with `n` elements
    // All fully allocated and initialized

    // Do something with the array
    // ...
}

This is valid C++03 code, so unless you use something ancient (like Turbo C++) any compiler today should support it.

3 Comments

This will give random memory access when iterating p, since each data_point_t instance contains a pointer pointing to a random address. This kind of solution works, (similar solution exists in C using malloc). This is however, as explained, no acceptable solution.
@dastrobu I would say that your requirements seems to strict, if no "sane" solution is acceptable. And before rejecting a solution because it's not "effective" enough, benchmark! The optimizers of modern compilers, together with the pre-fetching and caching strategies of modern CPUs, might surprise you. If you have two solutions, one which is optimal but hard to maintain and understand (premature optimizations and all that), and one solution whith acceptable efficiency but which is better written and more maintainable, you should probably choose the acceptable but maintainable solution.
I agree, that there are many cases where your solution is fine from a performance viewpoint. However, when the code is iterating mostly this kind of arrays, it is performance critical. I am just asking how to program that elegantly without doing all the memory management by hand, which I currently do. Thank you anyway for your help, but I think it is exactly this kind of problem which is interesting to discuss on stackoverflow, since the elegant solution to the problem is not obvious (to me).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.