0

This question is asking specifically why I am getting nulls from this encoding and is not a general question about how to convert a string to an array of bytes.

My actual use-case involves my input being a array of chars which I want to write to disk as an array of encoded bytes.

Why is it that when I try to encode a string in this way, the result has trailing nulls?

String someInput = "///server///server///server///";

char[] chars = someInput.toCharArray();
Charset encoding = StandardCharsets.UTF_8;

CharBuffer buf = CharBuffer.wrap(chars);

for (byte b : encoding.newEncoder().encode(buf).array())
   System.out.println("-> " + new Character((char)b));

The output is the following. Note that in the result example I have replaced the nulls with the '�' Unicode character for better visibility.

-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> s
-> e
-> r
-> v
-> e
-> r
-> /
-> /
-> /
-> �
-> �
-> �

2 Answers 2

1

When the underlying array is created, it doesn't know how big it should be and grows it in multiple bytes/characters at a time (adding one byte at a time would be very inefficient)

However, once it has finished converting the text, it doesn't then shrink the array to make it smaller (or take a copy) as this also would be expensive.

In short, you cannot assume the underlying buffer is exactly the size it needs to be, it could be larger. You should consider the position() and limit() as the bounds of which bytes to use.

Sign up to request clarification or add additional context in comments.

2 Comments

Makes perfect sense. Thank you. :)
@Peter, correct answer, one more finding I have written in my answer, thanks
1

I agree with @Peter answer, he is correct, I just want to add one more finding related to it, I debug this code and found that in the below for loop: At the call:

 encoding.newEncoder().encode(buf).array()

I debug the encode(buf) method call, and found that in CharsetEncoder.java file, in the encode() method, before starting the actual encoding it calculates the buffer size to allocate the encoded bytes by below line:

 int n = (int)(in.remaining() * averageBytesPerChar());

Here averageBytesPerChar() returns 1.1, and the size of our input ("///server///server///server///") is 30 , that's why the total size of newly allocated buffer i.e. n becomes 33.

That is the reason that in the output you are seeing 3 extra blank spaces. Hope It will help you in more understanding.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.