0
byte[] bytes = new byte[] { 1, -1 };
System.out.println(Arrays.toString(new String(bytes, "UTF-8").getBytes("UTF-8")));
System.out.println(Arrays.toString(new String(bytes, "ISO-8859-1").getBytes("ISO-8859-1")));

output:

[1, -17, -65, -67]
[1, -1]

why???

1

3 Answers 3

6

Your byte array isn't a valid UTF-8-encoded string... so the string you get from

new String(bytes, "UTF-8")

contains U+0001 (for the first byte) and U+FFFD to signify bad data in the second byte. When that string is encoded using UTF-8, you get the byte pattern shown.

Basically you shouldn't try to interpret arbitrary binary data as if it were encoded in a particular encoding. If you want to represent arbitrary binary data as a string, use something like base64.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks Jon. But I am not familiar with base64, how does base64 support all byte in case lossing data?
@seven: I'm not sure exactly what you mean - but it converts opaque binary data to just ASCII, which is generally easy to transport.
Is it possible some bytes, which are not included in ASCII Alphabet, can not convert to ASCII? thanks.
@seven: No, the whole point of base64 is that it takes any set of bytes and converts it to ASCII. That's why it ends up being longer (4 characters for every 3 bytes).
2

-1 is not a valid UTF-8 encoded character. [-17, -65, -67] is most likely the byte representation of the replacement character that gets substituted.

Comments

0

String isn't a container for binary data. It is a container for char. -1 isn't a legal value for a char. There's no reason why what you're doing should ever work. Ergo, don't do it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.