Java string - UTF and byte representation

Question

I am wondering about java String and byte representation of it. I have a file encoded in UTF-16 little endian, when I view it in my hexeditor I can see

ff fe 61 00 f3 00 61 00 00

now, when I load it to Java using

 BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"UTF-16"));
    StringBuilder builder = new StringBuilder();
    String line;

    while ((line = reader.readLine()) != null)
        builder.append(line);
    System.out.println(Arrays.toString(builder.toString().getBytes()));

I can see in output

[97, -13, 97]

if I am printing bytes why can't I see the zero ones that I can see in my hexeditor?

What is the o/p of builder.toString().getBytes("UTF-16LE") ? — Shashank Kadne
– Shashank Kadne, Commented Oct 5, 2012 at 8:26
Now I can see the zeros correctly. EDIT: If i specify LE or BE I can see them correctly. — Andna
– Andna, Commented Oct 5, 2012 at 8:28

RA. · Accepted Answer · 2012-10-05 08:30:33Z

3

That is because Java does not keeps the string in the UTF-16 format in memory, that would be wasteful, and because getBytes returns the string in the default system charset (which is probably not UTF-16 on your machine) javadoc . The proper overload would be getBytes("UTF-16") - this way you should see the 0 padding at the end and maybe the BOM (ff fe) in the beginning.

answered Oct 5, 2012 at 8:30

RA.

1,4231 gold badge11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Andna Over a year ago

I started to suspect that, now I am working on windows platform, so it makes sense now.

Collectives™ on Stack Overflow

Java string - UTF and byte representation

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related