5

Why does this junit test fail?

import org.junit.Assert;
import org.junit.Test;

import java.io.UnsupportedEncodingException;

public class TestBytes {
    @Test
    public void testBytes() throws UnsupportedEncodingException {
        byte[] bytes = new byte[]{0, -121, -80, 116, -62};
        String string = new String(bytes, "UTF-8");
        byte[] bytes2 = string.getBytes("UTF-8");
        System.out.print("bytes2: [");
        for (byte b : bytes2) System.out.print(b + ", ");
        System.out.print("]\n");
        Assert.assertArrayEquals(bytes, bytes2);
    }
}

I would assume that the incoming byte array equaled the outcome, but somehow, probably due to the fact that UTF-8 characters take two bytes, the outcome array differs from the incoming array in both content and length.

Please enlighten me.

2 Answers 2

4

The reason is 0, -121, -80, 116, -62 is not a valid UTF-8 byte sequence. new String(bytes, "UTF-8") does not throw any exception in such situations but the result is difficult to predict. Read http://en.wikipedia.org/wiki/UTF-8 Invalid byte sequences section.

Sign up to request clarification or add additional context in comments.

5 Comments

Especially UTF-8 cannot represent all byte sequences.
Thanks. I'd very much like to store those bytes in a String. Are there any encodings that support any byte sequences, or must I represent it the same way I printed it in the junit test above?
Try ISO-8859-1 it converts bytes into chars 1 to 1
@EvgeniyDorofeev thanks a bunch, those answers probably saved me a few hours!
@EvgeniyDorofeev "Try ISO-8859-1". This saved me some hours today. Been seaching for lots of questions about this and was trying UTF-8 encoding. +1, thanks!
1

The array bytes contains negative noted vales, these have the 8th bit (bit7) set and are converted into UTF-8 as multibyte sequences. bytes2 will be identical to bytes if you use only bytes with values in range 0..127. To make a copy of bytes as given one may use for example the arraycopy method:

    byte[] bytes3 = new byte[bytes.length];
    System.arraycopy(bytes, 0, bytes3, 0, bytes.length);

1 Comment

Thanks for the clarification about 8th bit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.