I got a byte array which carries strings encoded in UCS-2LE, generally, the null string terminator in UCS-2LE string would be encoded as two null bytes (00 00), but sometimes there's only one as below:
import java.nio.charset.Charset;
import java.util.Arrays;
class Ucs {
public static void main(String[] args) {
byte[] b = new byte[] {87, 0, 105, 0, 110, 0, 0};
String s = new String(b, Charset.forName("UTF-16LE"));
System.out.println(Arrays.toString(s.getBytes()));
System.out.println(s);
}
}
this program outputs
[87, 105, 110, -17, -65, -67]
Win�
I don't know why the internal byte array for string grows and where the unknown unicode comes from. How can I eliminate it?
getBytes()uses the user's default Java character encoding, which is unknown to us and probably unknown to you, too. Try dumping with a known, useful character encoding for Unicode such as UTF-16 or UTF-8.