1

If I have some binary data D And I convert it to string S. I expect than on converting it back to binary I will get D. But It's wrong.

public class A {
    public static void main(String[] args) throws IOException {
        final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
        System.out.println(bytes.length);               //prints 3
        System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
    }
}

Why does this happens?

2
  • Are you trying to force arbitrary binary data into a string? Why? Commented Feb 4, 2012 at 20:22
  • (If you do have to hide binary in a string for some reason, you'd need to use an encoding that provides a one-to-one mapping between bytes and characters; ISO-8859-1 would be the obvious choice. UTF-8 has byte sequences that do not represent valid characters.) Commented Feb 5, 2012 at 9:29

3 Answers 3

2

Converting between a byte array to a String and back again is not a one-to-one mapping operation. Reading the docs, the String implmentation uses the CharsetDecoder to convert the incoming byte array into unicode. The first and last bytes in your input byte array must not map to a valid unicode character, thus it replaces it with some replacement string.

Sign up to request clarification or add additional context in comments.

2 Comments

Good point. But it seems strange. Why it should use some magic replacement string instead of throwing exception?
I guess throwing an exception is possible from the CharsetDecoder when it encounters an unmappable character, but the default String implementation uses the less volatile option of a default error character. I bet you can use the CharsetDecoder yourself for more control over Byte[]<->String conversion.
1

It's likely that the bytes you're converting to a string don't actually form a valid string. If java can't figure out what you mean by each byte, it will attempt to fix them. This means that when you convert back to the byte array, it won't be the same as when you started. If you try with a valid set of bytes, then you should be more successful.

2 Comments

Yeah. But I except at least to get exception on this case.
The behaviour is configurable - you can ignore, replace or error. See docs.oracle.com/javase/1.5.0/docs/api/java/nio/charset/…, particularly the bit about the CodingErrorAction class
0

Your data can't be decoded into valid Unicode characters using UTF-8 encoding. Look at decoded string. It consists of 3 characters: 0xFFFD, 0x0068 and 0xFFFD. First and last are "�" - Unicode replacement characters. I think you need to choose other encoding. I.e. "CP866" produces valid string and converts back into same array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.