9

I have a test string:

String test = "oiwfoilfhlshflkshdlkfhsdlfhlskdhfslkhvslkvhvkjdhfkljshvdfkjhvdsköljhvskljdfhvblskjbkvljslkhjjssdlkhdsflksjflkjdlfjslkjljlfjslfjldfjjhvbksdjhbvslkdfjhbvslkjvhbslkvbjbn";

During debug I noticed following. When I print out the length:

System.out.println("Test length() : " + test.length());

returns

Test length() : 166

When I debug, I can read 333 as count for test variable.

enter image description here

What does the count represent?

1
  • 5
    You should mention what version of Java you're using as the count field doesn't exist in Java 11 (or at least I can't find it). Commented Dec 12, 2018 at 14:14

2 Answers 2

5

String implementation contains an array of chars - value. So count field in some implementations is used for calculation of the array's declared size.

One could notice that the count value provided differs the given String length twice - this looks like a hint to ASCII/UTF-8/UTF-16 divergence as per 1 Unicode (UTF-16) symbol is represented by 2 bytes in a String instance.

An example:

String str = "f";
str.length(); // 1
str.getBytes().length; // 1

but

String str = "ў";
str.length(); // 1
str.getBytes().length; // 2

See also:

What JDK are you using? It may bring more light on what exactly your count is.

Sign up to request clarification or add additional context in comments.

9 Comments

This answer is wrong. The references to count are public String(char value[], int offset, int count) and another similar one. Both are short lived, constructor parameters.
After String object creation, they are gone, cannot see then in debugger.
I think count represents the original byte reservation for the string (2 bytes per character), and then one for the terminator. However: Java immediately takes traditional ASCII chars back down to one byte following this reservation as an optimization step. See Anto Hlinisty's experiment with getBytes().length
Wait, so you just found a method parameter that has the same name as a private field in an unknown JRE and declared them the same? And for reference you provide an old JDK implementation with a completely different meaning of that private field? (Note, that very implementation you link has public int length() {return count;}, so it wouldn't show any of the behaviour in the question) How did this get so many upvotes?
"(UTF-16) symbol is represented by 2 chars": A Java char, Character and String element are indeed 1 UTF-16 code unit, which takes 2 bytes. Saying chars causes confusion.
|
3

When asking Java-related questions, always mention that as there are some major differences.

The android ART runtime optimizes java.lang.String by compressing the normally two-byte Java chars into single-byte ASCII strings when possible. You can see it in the source of java.lang.String:

public int length() {
    // BEGIN Android-changed: Get length from count field rather than value array (see above).
    // return value.length;
    final boolean STRING_COMPRESSION_ENABLED = true;
    if (STRING_COMPRESSION_ENABLED) {
        // For the compression purposes (save the characters as 8-bit if all characters
        // are ASCII), the least significant bit of "count" is used as the compression flag.
        return (count >>> 1);
    } else {
        return count;
    }
}

String compression is specified in the native code as:

// String Compression
static constexpr bool kUseStringCompression = true;
enum class StringCompressionFlag : uint32_t {
    kCompressed = 0u,
    kUncompressed = 1u
};

This flag is OR-ed with the count value:

static int32_t GetFlaggedCount(int32_t length, bool compressible) {
    return kUseStringCompression
        ? static_cast<int32_t>((static_cast<uint32_t>(length) << 1) |
                               (static_cast<uint32_t>(compressible
                                                          ? StringCompressionFlag::kCompressed
                                                          : StringCompressionFlag::kUncompressed)))
        : length;
}

When loading strings from the constant pool, however, string compression is not performed. Hence you get a doubling of the original char count + 1 (333 = 166 * 2 + 1). That additional 1 is the "uncompressed" flag.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.