1

Please see JLS7. Section 3.2 page 16 states

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

I disassembled a hello world program.

class Y {
String hello = "hello";
}

Following is the assembly:

Classfile /c:/Work/SR1/e2/tmp/Y.class
Last modified Jan 5, 2014; size 240 bytes
MD5 checksum 96694fda4f346a62d5412c56dc36c45d
Compiled from "X.java"
class Y
  SourceFile: "X.java"
  minor version: 0
  major version: 52
  flags: ACC_SUPER
  Constant pool:
  #1 = Class              #2             //  Y
  #2 = Utf8               Y
  #3 = Class              #4             //  java/lang/Object
  #4 = Utf8               java/lang/Object
  #5 = Utf8               hello
  #6 = Utf8               Ljava/lang/String;
  #7 = Utf8               <init>
  #8 = Utf8               ()V
  #9 = Utf8               Code
  #10 = Methodref          #3.#11         //  java/lang/Object."<init>":()V
  #11 = NameAndType        #7:#8          //  "<init>":()V
  #12 = String             #5             //  hello
  #13 = Fieldref           #1.#14         //  Y.hello:Ljava/lang/String;
  #14 = NameAndType        #5:#6          //  hello:Ljava/lang/String;
  #15 = Utf8               LineNumberTable
  #16 = Utf8               SourceFile
  #17 = Utf8               X.java
  {
  ...

I see only Utf8 encoding and no Utf16. Why there is no Utf16 encoding.

Thanks

3
  • 3
    Because ... your charset is UTF8. The string internally is holding 16bit codepoints. Commented Jan 5, 2014 at 4:25
  • type putting some Chinese characters in your string constant and see what it decomplies to Commented Jan 5, 2014 at 4:32
  • How do I put 电 in a text file? Commented Jan 5, 2014 at 4:40

1 Answer 1

5

In an executing program, text is (typically1) represented in UTF-16.

But in a ".class" file, text in the constant pool (i.e. String literals, identifiers, and so on) is encoded in UTF-8 to save space. (Encoding of constant pool entries in UTF-8 is mandated by the JVM spec - Section 4.4 ... and is nothing to do with default character sets.)

When the class file is loaded, the UTF-8 constant pool entries are transcoded to UTF-16 by the classloader.


1 - An application could be written to encode text in a myriad different ways. The UTF-16 encoding we are talking about here is the natural encoding scheme for text data in Java; i.e. the encoding you get when you store text a String or any other subtype of CharacterSequence.

Sign up to request clarification or add additional context in comments.

4 Comments

There is also no other encoding for a String in the constant pool of a class file: docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4
@Charlie - correct. (I assume you are referring to Brian Roach's misleading comment ...)
Yep. Also thought since the question quoted the JLS spec it would be good to provide a link to a relevant portion of the JVM spec.
Thanks, I've incorporated this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.