5

I'm looking at the openjdk implementation of String and the private, per instance members look like:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence
{
    /** The value is used for character storage. */
    private final char value[];

    /** The offset is the first index of the storage that is used. */
    private final int offset;

    /** The count is the number of characters in the String. */
    private final int count;

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    [...]
}

But I know that Java uses reference and pools for Strings, to avoid duplication. I was naively expecting a pimpl idiom, where String would in fact be just a ref to an impl. I'm not seeing that so far. Can someone explain how Java will know to use references if I put a String x; member in one of my classes?

Addendum: this is probably wrong, but if I'm in 32 bits mode, should I count: 4 bytes for the reference "value[]", 4 bytes for offset, 4 for count and 4 for hash for everything instance of class String? That would mean that writing "String x;" in one of my class automatically adds at least 32 bytes to the "weight" of my class (I'm probably wrong here).

3
  • 1
    A String is a String. Whether it is pooled or not makes no difference to the nature of the String, only to where it is stored and optimisations that can be made around it. Commented Aug 17, 2012 at 16:29
  • A pimpl idiom? My guess would be that you are coming from a C++ background, and that you haven't appreciated how different Java's object model is from C++'s. Java isn't just badly-spelled C++, it's a very different language. I would urge you to read the Java Language Specification to get a more detailed understanding of what Java is like. Commented Aug 17, 2012 at 16:44
  • Good guess! I'm a C++ guy. I have some experience with Java, but I find it pretty disconcerting. Commented Aug 17, 2012 at 16:49

4 Answers 4

3

The offset/count fields are somewhat orthogonal to the pooling/intern() issues. Offset and count come when you have something like:

String substring = myString.substring(5);

One way to implement this method would be something like:

  • allocate a new char[] with myString.length() - 5 elements
  • copy all of the elements from index index 5 to myString.length() from myString to the new char[]
  • substring is constructed with this new char[]
    • substring.charAt(i) goes directly to chars[i]
    • substring.length() goes directly to chars.length

As you san see, this approach is O(N) -- where N is the new string's length -- and requires two allocations: the new String, and the new char[]. So instead, substring works by resusing the original char[] but with an offset:

  • substring.offset = myString.offset + newOffset
  • substring.count = myString.count - newOffset
  • use myString.chars as the chars array for substring
    • substring.charAt(i) goes to chars[i+substring.offset]
    • substring.length() goes to substring.count

Note that we didn't need to create a new char[], and more importantly, we didn't need to copy the chars from the old char[] to the new one (since there is no new one). So this operation is just O(1) and requires only one allocation, that of the new String.

Sign up to request clarification or add additional context in comments.

1 Comment

So this operation is just O(N) and requires only one allocation, that of the new String. This should say O(1). Mistake gone over a year uncorrected! lol
2

Java always uses references to any object. There's no way to make it not use references. As for string pooling, that is achieved by the compiler for string literals and at runtime by calling String.intern. It is natural that most of the implementation of String is oblivious to whether it is dealing with an instance referred to by the constant pool or not.

18 Comments

String a = "a"; and String b = new String("a"); use two different memory models, as well.
Frank, you must understand that the matters are just not that simple. Your class is not the owner of the String instance. The instance itself and, even more importantly, the backing char[], is being shared around so the memory is reused.
@LouisWasserman In general even this will not end the calculation because one must take into account memory alignment, which increases the count towards the nearest multiple of 8.
String x; introduces a null reference costing four bytes. No actual String value exists yet, so that's the full extent of the cost.
@MarkoTopolnik memory model = pooled vs. not pooled (I suppose)
|
2

The accepted answer and other answers are outdated. After the Java 7 update 6, strings in Java no longer use offsets and are not tuned for substring optimization. Instead, every substring creates a new copy of the string.

If you wanted to use the original string implementation, you'd have to use CharSequence.

For more information: https://jaxenter.com/the-state-of-string-in-java-107508.html

Comments

1

Java Strings are immutable. This means that the implementation can do a whole lot of things to the internal representation, without breaking any application code.

Note that the Java String.intern() has been defined to be native in the JDK implementation of Oracle. Native code has access to all fields of an object and may change the reference under water. So all that the implementors have to do is to change the reference and offset to a location where the string is interned and voila. Of course this breaks the immutability of the class, so this means that the intern() update better be thread safe.

You could check what happens to the fields when you call intern() on a newly generated String. If nothing happens, it might be that the reference itself contains the memory location instead. The Java language specification does not define how references are implemented.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.