9

i am wondering :char *cs = .....;what will happen to strlen() and printf("%s",cs) if cs point to memory block which is huge but with no '\0' in it? i write these lines:

 char s2[3] = {'a','a','a'};
printf("str is %s,length is %d",s2,strlen(s2));

i get the result :"aaa","3",but i think this result is because that a '\0'(or a 0 byte) happens to reside in the location s2+3. how to make a not null-terminated c string? strlen and other c string function relies heavily on the '\0' byte,what if there is no '\0',i just want know this rule deeper and better.

ps: my curiosity is aroused by studying the follw post on SO. How to convert a const char * to std::string and these word in that post : "This is actually trickier than it looks, because you can't call strlen unless the string is actually nul terminated."

7
  • 3
    Your code is undefined behaviour. If you don't want the null, you'll need a length of some sort, so std::string in a nutshell. Commented Dec 19, 2013 at 13:57
  • Are you asking about C or C++? They're very different languages, with different options for avoiding or dealing with this situation. Commented Dec 19, 2013 at 14:07
  • 1
    strlen,and .. all is using null terminating C string.If you want you can write your own string lib with its functions,like std::string class ,bstr. Commented Dec 19, 2013 at 14:10
  • "if cs point to memory block which is huge but with no '\0' in it.." -- but you can't know if there is a zero or not, because you don't legitimately own that extra memory. Commented Dec 19, 2013 at 14:54
  • 2
    your question title is a contradiction in itself. In C, a string is a nul-terminated char array. Commented Dec 19, 2013 at 15:10

7 Answers 7

27

If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. You'll need to keep track of the length some other way.

You can still print a non-terminated character array with printf, as long as you give the length:

printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);

or, if you have access to the array itself, not a pointer:

printf("str is %.*s", (int)(sizeof s2), s2);

You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 for the use of printf with string length parameter - which I had completely forgotten about. And coffee up my nose for "error-prone malarkey".
10

A "C string" is, by definition, null-terminated. The name comes from the C convention of having null-terminated strings. If you want something else, it's not a C string.

So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. You can't use strlen, strcpy or strcat. Basically, any function that takes a char* but no separate length is not usable.

Then what can you do? If you have a string that is not null-terminated, you will have the length separately. (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. Or you can write your own set of string manipulation functions that work with pointer and length. In C++ you can use std::string's constructor that takes a char* and a length; that one doesn't need the terminator.

1 Comment

strncpy?. It still limits the usefulness, and it is better to use memcpy instead as its more explicitly useful on non-null terminated data buffers (which is what the OP wants).
6

Your supposition is correct: your strlen is returning the correct value out of sheer luck, because there happens to be a zero on the stack right after your improperly terminated string. It probably helps that the string is 3 bytes, and the compiler is likely aligning stuff on the stack to 4-byte boundaries.

You cannot depend on this. C strings need NUL characters (zeroes) at the end to work correctly. C string handling is messy, and error-prone; there are libraries and APIs that help make it less so… but it's still easy to screw up. :)

In this particular case, your string could be initialized as one of these:

  • A: char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long
  • B: char *s2 = "aaa"; // if you don't need to modify the string after creation
  • C: char s2[]="aaa"; // if you DO need to modify the string afterwards

Also note that declarations B and C are 'safer' in the sense that if someone comes along later and changes the string declaration in a way that alters the length, B and C are still correct automatically, whereas A depends on the programmer remembering to change the array size and keeping the explicit null terminator at the end.

8 Comments

Or char s[]="aaa"; which allows you to modify the string... +1 for the "sheer luck / alignment" comment.
@floris good point about the s[]="aaa" declaration - thanks for adding that.
"..there happens to be a zero on the stack.." -- some compilers clear variable space in debug builds (only). Nevertheless, the original question is skewed by total misunderstanding.
@jongware Interesting! Which compilers do you know of that do that on debug builds? (Heck, I would prefer it if they filled the stack with random garbage to make buggy programs crash during development instead of out in the field.)
@Jongware : "skewed by total misunderstanding" - I agree. But that is why the question is asked... to increase understanding (or diminish misunderstanding).
|
4

What happens is that strlen keeps going, reading memory values until it eventually gets to a null. it then assumes that is the terminator and returns the length that could be massively large. If you're using strlen in an environment that expects C-strings to be used, you could then copy this huge buffer of data into another one that is just not big enough - causing buffer overrun problems, or at best, you could copy a large amount of garbage data into your buffer.

Copying a non-null terminated C string into a std:string will do this. If you then decide that you know this string is only 3 characters long and discard the rest, you will still have a massively long std:string that contains the first 3 good characters and then a load of wastage. That's inefficient.

The moral is, if you're using the CRT functions to operator on C strings, they must be null-terminated. Its no different to any other API, you must follow the rules that API sets down for correct usage.

Of course, there is no reason you cannot use the CRT functions if you always use the specific-length versions (eg strncpy) but you will have to limit yourself to just those, always, and manually keep track of the correct lengths.

Comments

1

Convention states that a char array with a terminating \0 is a null terminated string. This means that all str*() functions expect to find a null-terminator at the end of the char-array. But that's it, it's convention only.

By convention also strings should contain printable characters.

If you create an array like you did char arr[3] = {'a', 'a', 'a'}; you have created a char array. Since it is not terminated by a \0 it is not called a string in C, although its contents can be printed to stdout.

3 Comments

strncpy doesn't require a null-terminator.
@SamuelEdwinWard from the manual If the length of src is less than n, (...). A length of a string in C is determined by finding the first \0. So even here the convention is respected. Source: man strncpy
It does stop copying at a null byte, but it works fine without a null terminator, and always writes n bytes.
0

The C standard does not define the term string until the section 7 - Library functions. The definition in C11 7.1.1p1 reads:

  1. A string is a contiguous sequence of characters terminated by and including the first null character.

(emphasis mine)

If the definition of string is a sequence of characters terminated by a null character, a sequence of non-null characters not terminated by a null is not a string, period.

2 Comments

Well yeah but this was already stated about 10 times six years ago.
@user207421 excusez-moi it wasn't. Please show the place where the direct quotation from the standard alongside with a link happens.
-1

What you have done is undefined behavior.

You are trying to write to a memory location that is not yours.

Change it to

char s2[] = {'a','a','a','\0'};

4 Comments

And you also improperly read what OP wants. He purposedly did not null-terminate his string because he is asking about not null-terminated strings.
Improper writes means writing to memory that is not "yours" - ie you are overwriting memory that is used by another variable (or worse, some code). Think of it like this... "somehow the account total becomes huge. All I do it write 'fred' to the member variable and the account value changes. WTF?!". That's what happens with an improper write.
To be more specific, I don't think there are any improper writes in the code in the question.
@ Eregrith.tks,you got my intention,i just want produce the scenario

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.