decode encode between String and byte in java

Question

byte[] bytes = new byte[] { 1, -1 };
System.out.println(Arrays.toString(new String(bytes, "UTF-8").getBytes("UTF-8")));
System.out.println(Arrays.toString(new String(bytes, "ISO-8859-1").getBytes("ISO-8859-1")));

output:

[1, -17, -65, -67]
[1, -1]

why???

stackoverflow.com/questions/2544965/…

Bozho
– Bozho

2010-04-14 05:54:06 +00:00
Commented Apr 14, 2010 at 5:54 — Bozho
– Bozho, Commented Apr 14, 2010 at 5:54

Jon Skeet · Accepted Answer · 2010-04-14 05:59:40Z

6

Your byte array isn't a valid UTF-8-encoded string... so the string you get from

new String(bytes, "UTF-8")

contains U+0001 (for the first byte) and U+FFFD to signify bad data in the second byte. When that string is encoded using UTF-8, you get the byte pattern shown.

Basically you shouldn't try to interpret arbitrary binary data as if it were encoded in a particular encoding. If you want to represent arbitrary binary data as a string, use something like base64.

answered Apr 14, 2010 at 5:59

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

seven Over a year ago

thanks Jon. But I am not familiar with base64, how does base64 support all byte in case lossing data?

Jon Skeet Over a year ago

@seven: I'm not sure exactly what you mean - but it converts opaque binary data to just ASCII, which is generally easy to transport.

seven Over a year ago

Is it possible some bytes, which are not included in ASCII Alphabet, can not convert to ASCII? thanks.

Jon Skeet Over a year ago

@seven: No, the whole point of base64 is that it takes any set of bytes and converts it to ASCII. That's why it ends up being longer (4 characters for every 3 bytes).

Michael Borgwardt · Accepted Answer · 2010-04-14 06:06:22Z

2

-1 is not a valid UTF-8 encoded character. [-17, -65, -67] is most likely the byte representation of the replacement character that gets substituted.

answered Apr 14, 2010 at 6:06

Michael Borgwardt

347k81 gold badges491 silver badges726 bronze badges

Comments

user207421 · Accepted Answer · 2010-04-14 11:21:21Z

0

String isn't a container for binary data. It is a container for char. -1 isn't a legal value for a char. There's no reason why what you're doing should ever work. Ergo, don't do it.

answered Apr 14, 2010 at 11:21

user207421

312k45 gold badges324 silver badges493 bronze badges

Collectives™ on Stack Overflow

decode encode between String and byte in java

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related