Is this array subscripting behavior really undefined in C?

Question

unsigned int n = 1;
char* s = "X" + 1;
s[-n];

Is that third statement undefined in C23? It seems it is, as by the standard in 6.5.2.1.2:

... The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))) ...

In that expression, (E2) is a very large positive value, exceeding array bounds.

I have been doing this without thinking about it and the naturally expected behavior always seems to be what happens. Can anyone confirm this is undefined behavior? Am I missing something? Is anyone aware of a real world situation where this would actually be a problem? As in how urgently do I need to go thru my code base looking for these instances?

Answer 1 · 2025-11-05 05:38:38Z

chux

• Nov 5 at 5:38

s[-n] is like s[UINT_MAX] and attempts to index well outside the the 2-character size of string literal "X". It is UB.

Answer 2 · 2025-11-05 05:44:24Z

chux

• Nov 5 at 5:44

"As in how urgently do I need to go thru my code base looking for these instances?" --> Step 1, save time and enable many (if not just about all) compiler warnings.

Answer 3 · 2025-11-05 07:38:33Z

char* s = "X" + 1; This is allowed, you point at the last item in an array of 2 char items {'X', '\0'}. As a special rule, pointing one item beyond that array (but not further) would be fine too, as long as the pointer isn't de-referenced.
The unary - operator involves integer promotion but since n is already unsigned int no implicit conversion takes place. But as a result of the -, the value -1 will have to get stored in an unsigned int where it does not fit since it is negative. A well-defined conversion then happens, as stated in C23 6.3.2.3:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.⁵¹

^{51)The rules describe arithmetic on the mathematical value, not the value of a given type of expression.}

That is: -1 + UINT_MAX + 1 = UINT_MAX = 4294967295 on a 32 bit machine.
It is true that E1[E2] is per definition 100% identical to (*((E1)+(E2))). Therefore, the rules for the binary + operator is what determines what will happen whenever we do array indexing.
So therefore we have to check the rules for the additive operators C23 6.5.7:

For addition, either both operands shall have arithmetic type, or one operand shall be a pointer to a complete object type and the other shall have integer type.
...
If both operands have arithmetic type, the usual arithmetic conversions are performed on them.

In this case one operand is a pointer to an object and the other is an integer. The usual arithmetic conversions are not performed, meaning no further conversion takes place.
The result of s (an address) + 4294967295 is clearly out of bounds of the array. Further down in the text for the additive operators C23 6.5.7:

If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined. If the addition or subtraction produces an overflow, the behavior is undefined.

Both of these two mentioned forms of UB can happen here. The value is clearly outside the array bounds, so it is definitely UB for that reason. But the value could also be out of range for the pointer type's representation, and then it UB again for that reason too (overflow).

Answer 4 · 2025-11-05 08:29:42Z

n. m. could be an AI

• Nov 5 at 8:29

"the naturally expected behavior always seems to be what happens"

For some values of "always" and/or "naturally expected behavior". Exhibit A. Running on 32 bit platforms lately much?

Answer 5 · 2025-11-05 09:09:06Z

chqrlie

• Nov 5 at 9:09

Can you post this as a question? This new advice feature is counterproductive: even fewer questions will be asked, defeating the voting / reputation system.

Answer 6 · 2025-11-05 12:25:06Z

Is anyone aware of a real world situation where this would actually be a problem?

Undefined behavior can make anything false. Suppose you have a test T and undefined behavior UB. In the code if (T) { … UB … }, the compiler is allowed to conclude UB will never occur, and therefore T must be false. This can lead it to determine that variables used in T must or must not have certain values, and so on.

In particular, in the code if (T) { … UB … } else { … OtherCode … }, the compiler may eliminate the test T (except for any observable behavior it contains) and eliminate the “then” clause and always execute only the “else” clause.

Whether this occurs in any particular situation may be difficult to predict, as compiler optimizers handle numerous chains of deductions/reductions/transformations.

In your case, consider the code:

unsigned int n = 1;
char* s = "X" + 1;

char *p = malloc(foo);
if (!p)
{
     fprintf(stderr, "Error, out of memory.\n");
     exit(EXIT_FAILURE);
}
*p = s[n];
…

Similarly to the code explained above, the compiler may conclude !p is always true, in which case this program would always report it is out of memory even though the malloc actually returns a non-null pointer. That would be good since you would catch the error immediately, before deploying the program. But the same compiler behavior could produce a more insidious result that is not caught for a long time.

Answer 7 · 2025-11-05 12:33:08Z

@Lundin:

But as a result of the -, the value -1 will have to get stored in an unsigned int where it does not fit since it is negative. A well-defined conversion then happens…

There is no store in s[-n];. n is negated in its unsigned int type, with wrapping, and the result, UINT_MAX, is directly used in pointer arithmetic. The rules of 6.3.2.3 do not apply because there is no conversion; the negation wraps per 6.2.5, “… arithmetic for the unsigned type is performed modulo 2^N.”

Answer 8 · 2025-11-05 18:17:24Z

Is anyone aware of a real world situation where this would actually be a problem?

It breaks this program:

char foo(void)
{
    unsigned int n = 1;
    char *s = "X" + 1;

    return s[-n];
}


#include <stdio.h>


int main(void)
{
    printf("%d\n", foo());
}

When compiled with Clang 21.1.0 using -O3, Clang recognizes that s[-n] has undefined behavior and elides it. With the whole program shown above, it inserts xor esi, esi to generate an argument to printf, resulting in the program printing “0”. If the function is compiled by itself, Clang optimizes it to a sole ret instruction with nothing to set the return value.

If unsigned int n is changed to int n, then Clang generates code to produce 88 (ASCII for “X”). Thus, it is clear that the undefined behavior of indexing with -n for an unsigned int n breaks this program.

I do note that if unsigned int n is changed to unsigned long long int n, then Clang does generate 88. Thus, the fact that unsigned int is 32 bits while the address space is 64 bits is a factor in Clang recognizing the undefined behavior. It could do the same thing with unsigned long long int, but I cannot say whether the reason it does not is because its tests wrap just like unsigned arithmetic normally does, so it fails to recognize that indexing with this value has undefined behavior, or because it was designed to accommodate unsigned arithmetic in array indexing, if it has an appropriate width. Keeping unsigned int but compiling with -m32 also results in 88.

Answer 9 · 2025-11-05 20:09:02Z

As has been thoroughly conveyed by now, an essential component of the problem is that n has an unsigned type, yet you want to compute negative numbers based on its value. You can do that by first converting to a suitable signed type, if there is one:

s[-(int)n];

What signed types, if any, are suitable depends on the range of values of n you intend to support, but a pretty natural choice for the context would be ptrdiff_t, which is the type of the arithmetic difference between two pointers. That's not guaranteed to be suitable for all possible values of n, but if it turns out not to be suitable in a given situation then probably none of the standard signed integer types are suitable in that situation.

As an alternative to doing such conversions, however, you're better off declaring n with the chosen signed type in the first place. That's not susceptible to error from accidentally omitting needed casts, and it may help the compiler recognize range issues. Thus:

#include <stdint.h>

// ...

ptrdiff_t n = 1;
char* s = "X" + 1;
s[-n];

Collectives™ on Stack Overflow

Is this array subscripting behavior really undefined in C?

9 Replies 9

Your Reply

Collectives™ on Stack Overflow

Is this array subscripting behavior really undefined in C?

9 Replies 9

Your Reply

Sign up or log in

Post as a guest