Understanding C-strings & string literals in C++

Question

I have a few questions I would like to ask about string literals and C-strings.

So if I have something like this:

char cstr[] = "c-string";

As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.

It is then legal to perform this:

for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
    cstr[i] = 97+i;

So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?

But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.

char * p = "const cstring";
*p = 'A'; // illegal memory write

I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.

char * p = "const cstring"; should throw a compilation error, since "const cstring" is type const char* (specifically so that you don't use it like you're using it in your example) — tylerl
– tylerl, Commented Oct 6, 2011 at 3:44
@LightnessRacesinOrbit as of C++11 string literals are of type const char[N], which is perdy much equivalent to const char*. You can argue the pedantic details, but that's only adding confusion, not clarity. — tylerl
– tylerl, Commented Jan 27, 2019 at 0:35
@tylerl On the contrary, those are completely different types, and pretending otherwise is how confusion is introduced. — Lightness Races in Orbit
– Lightness Races in Orbit, Commented Jan 27, 2019 at 1:21

tylerl · Accepted Answer · 2011-10-06 03:41:19Z

5

The bit that you're missing is a little compiler magic where this:

char cstr[] = "c-string";

Actually executes like this:

char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);

You don't see that bit, but it's more or less what the code compiles to.

answered Oct 6, 2011 at 3:41

tylerl

31k14 gold badges85 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bobby Barjasteh Over a year ago

This is definitely what I have been missing! The answer I chose as the selected answer basically put this into words, but the code is even cleaner ;) so really cstr is a const char * to locally allocated memory on the stack (or possibly the heap) that is a copy and is modifiable, unlike the literal string's values. thanks so much for showing me this.

tylerl Over a year ago

It's worth pointing out that in this case cstr is not a const char* but rather a char*. A const char* is what you get with string literals. That means that you can't go modifying its contents. In contrast, a char* means the data is modifiable. Also, it's very definitely on the stack, not the heap, which is why you don't have to call free() on it, and why you can't return it to your caller.

Hot Licks · Accepted Answer · 2011-10-06 03:36:28Z

2

char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...

char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".

answered Oct 6, 2011 at 3:36

Hot Licks

47.8k19 gold badges96 silver badges156 bronze badges

1 Comment

Bobby Barjasteh Over a year ago

Thanks for the insight on this. I see now that arrays, being a standard type, are literally initialized in their constructor by string literals, but are only copies of the literal itself.

Vlado Klimovský · Accepted Answer · 2011-10-06 03:38:12Z

1

In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.

In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.

Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.

(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)

answered Oct 6, 2011 at 3:38

Vlado Klimovský

5412 gold badges4 silver badges12 bronze badges

2 Comments

Bobby Barjasteh Over a year ago

Ah, I see. So the difference is that one is really just a pointer to the RO memory whilst the other is a variable of array of char (or constant pointer in other words) to a local copy of that literal in memory plus the null byte, and thus access to the copy is RW. Thanks for clarifying

Hot Licks Over a year ago

Didn't always used to be RO memory, BTW. Modifying a "constant" string was an old C programmer's trick, before they started putting the strings in a read-only section.

Lightness Races in Orbit · Accepted Answer · 2019-01-20 13:37:54Z

1

char cstr[] = "c-string";

This copies "c-string" into a char array on the stack. It is legal to write to this memory.

char * p = "const cstring";
*p = 'A'; // illegal memory write

Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.

Collectives™ on Stack Overflow

Understanding C-strings & string literals in C++

4 Answers 4

2 Comments

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related