1

I am wondering about java String and byte representation of it. I have a file encoded in UTF-16 little endian, when I view it in my hexeditor I can see

ff fe 61 00 f3 00 61 00 00

now, when I load it to Java using

 BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"UTF-16"));
    StringBuilder builder = new StringBuilder();
    String line;

    while ((line = reader.readLine()) != null)
        builder.append(line);
    System.out.println(Arrays.toString(builder.toString().getBytes()));

I can see in output

[97, -13, 97]

if I am printing bytes why can't I see the zero ones that I can see in my hexeditor?

2
  • 3
    What is the o/p of builder.toString().getBytes("UTF-16LE") ? Commented Oct 5, 2012 at 8:26
  • 1
    Now I can see the zeros correctly. EDIT: If i specify LE or BE I can see them correctly. Commented Oct 5, 2012 at 8:28

1 Answer 1

3

That is because Java does not keeps the string in the UTF-16 format in memory, that would be wasteful, and because getBytes returns the string in the default system charset (which is probably not UTF-16 on your machine) javadoc . The proper overload would be getBytes("UTF-16") - this way you should see the 0 padding at the end and maybe the BOM (ff fe) in the beginning.

Sign up to request clarification or add additional context in comments.

1 Comment

I started to suspect that, now I am working on windows platform, so it makes sense now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.