5

I've recently got into some pieces of code doing some questionable 2D arrays indexing operations. Considering as an example the following code sample:

int a[5][5];
a[0][20] = 3;
a[-2][15] = 4;
a[5][-3] = 5;

Are the indexing operations above subject to undefined behavior?

2
  • 3
    There's a good duplicate of this but I can't find it , the SO search function is much worse than people's memories Commented Aug 5, 2014 at 13:44
  • 1
    Possible duplicate here, not sure if we should close this one, though, as the other one is not asked in a good way, additionally, accepted answer here is better... Commented Feb 7, 2020 at 11:50

2 Answers 2

6

It's undefined behavior, and here's why.

Multidimensional array access can be broken down into a series of single-dimensional array accesses. In other words, the expression a[i][j] can be thought of as (a[i])[j]. Quoting C11 §6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

This means the above is identical to *(*(a + i) + j). Following C11 §6.5.6/8 regarding addition of an integer and pointer (emphasis mine):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

In other words, if a[i] is not a valid index, the behavior is immediately undefined, even if "intuitively" a[i][j] seems in-bounds.

So, in the first case, a[0] is valid, but the following [20] is not, because the type of a[0] is int[5]. Therefore, index 20 is out of bounds.

In the second case, a[-1] is already out-of-bounds, thus already UB.

In the last case, however, the expression a[5] points to one past the last element of the array, which is valid as per §6.5.6/8:

... if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object ...

However, later in that same paragraph:

If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So, while a[5] is a valid pointer, dereferencing it will cause undefined behavior, which is caused by the final [-3] indexing (which, is also out-of-bounds, therefore UB).

Sign up to request clarification or add additional context in comments.

8 Comments

"[…] because the type of a[0] is int[5] […]"—that's the part where I'm stuck. a[0] is subject to lvalue conversion here, so it decays to int *. Not sure about this…
Even though it decays to int *, it's still a pointer to an array (which I'm inclined to believe is considered having only 5 elements).
@mafso a[0] has type int[5] ; the decayed pointer is an rvalue (and it points into an object which is an array of 5 ints)
@mafso actually, a[5]-3 means that a + 5 is dereferenced. As I mentioned, a[5][-3] is equivalent to *(*(a + 5) - 3); the expression *(a + 5) is UB.
bear in mind that a pointer is allowed to also store the bounds of what it is pointing to, so that a bounds-checking implementation is legal. The bounds are determined by what object the object being pointed to is a member of.
|
-1

array indexing with negative indexes is undefined behaviour. Sorry, that a[-3] is the same as *(&a - 3) in most architectures/compilers, and accepted without warning, but the C language allows you to add negative integers to pointers, but not use negative values as array indexes. Of curse this is not even checked at runtime.

Also, there are some issues to be acquainted for when defining arrays in front to pointers. You can leave unspecified just the first subindex, and no more, like in:

int a[][3][2]; /* array of unspecified size, definition is alias of int (*a)[3][2]; */

(indeed, the above is a pointer definition, not an array, just print sizeof a)

or

int a[4][3][2]; /* array of 24 integers, size is 24*sizeof(int) */

when you do this, the way to evaluate the offset is different for arrays than for pointers, so be carefull. In case of arrays, int a[I][J][K];

&a[i][j][k] 

is placed at

&a + i*(sizeof(int)*J*K) + j*(sizeof(int)*K) + k*(sizeof(int))

but when you declare

int ***a; 

then a[i][j][k] is the same as:

*(*(*(&a+i)+j)+k), meaning you have to dereference pointer a, then add (sizeof(int **))*i to its value, then dereference again, then add (sizeof (int *))*j to that value, then dereference it, and add (sizeof(int))*k to that value to get the exact address of the data.

BR

3 Comments

int a[][3][2]; is illegal. You must either specify the first dimension, or give an initializer from which the first dimension is calculated. It's not a "pointer alias". You may be getting mixed up with the meaning of an array declarator in a function parameter list, but in that case int a[4][3][2] is also int (*a)[3][2].
in &a + i * (sizeof... you meant (char *)&a ; pointer arithmetic is done in terms of the size of the object being pointed to
a[i][j][k] is the same as *(*(*(a+i)+j)+k) (note the lack of &)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.