convert byte array ending with single null byte to UTF16 encoded string

Question

I got a byte array which carries strings encoded in UCS-2LE, generally, the null string terminator in UCS-2LE string would be encoded as two null bytes (00 00), but sometimes there's only one as below:

import java.nio.charset.Charset;
import java.util.Arrays;

class Ucs {
    public static void main(String[] args) {
        byte[] b = new byte[] {87, 0, 105, 0, 110, 0, 0}; 
        String s = new String(b, Charset.forName("UTF-16LE"));
        System.out.println(Arrays.toString(s.getBytes()));
        System.out.println(s);
    }   
}

this program outputs

[87, 105, 110, -17, -65, -67]
Win�

I don't know why the internal byte array for string grows and where the unknown unicode comes from. How can I eliminate it?

getBytes() uses the user's default Java character encoding, which is unknown to us and probably unknown to you, too. Try dumping with a known, useful character encoding for Unicode such as UTF-16 or UTF-8. — Tom Blodget
– Tom Blodget, Commented Nov 9, 2017 at 2:39
"sometimes there's only one": Can you prevent the problem upstream? — Tom Blodget
– Tom Blodget, Commented Nov 9, 2017 at 2:43
If you don't like the replacement character (�) quietly indicating the data corruption, you can configure a character decoder that throws an exception instead. — Tom Blodget
– Tom Blodget, Commented Nov 9, 2017 at 2:46
@TomBlodget Thanks for the tip. Upstream is out of my control and wasted my time! — jfly
– jfly, Commented Nov 9, 2017 at 5:24

orip · Accepted Answer · 2017-11-07 12:49:13Z

1

Would a hack to ignore a final odd-length byte help?

int bytesToUse = b.length%2 == 0 ? b.length : b.length - 1;
String s = new String(b, 0, bytesToUse, Charset.forName("UTF-16LE"));

answered Nov 7, 2017 at 12:49

orip

76k21 gold badges120 silver badges150 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jfly Over a year ago

Yep, It's a way:)

Timothy Truckle · Accepted Answer · 2017-11-07 13:20:46Z

1

use an InputStreamReader along with the proper Charset or a custom CharsetDecoder.

Reader reader = new InputStreamReader(
   new ByteArrayInputStream(new byte[]{87, 105, 110, -17, -65, -67,0,0}),
   Chaset.forName("UTF-16LE"));

Reader reader = new InputStreamReader(
   new ByteArrayInputStream(new byte[]{87, 105, 110, -17, -65, -67,0,0}),
   new CharsetDecoder(Chaset.forName("UTF-16LE"),1,2){
      @Override
      protected CoderResult     decodeLoop(ByteBuffer in, CharBuffer out){
        // detect trailing zero(s) to skip them
        // maybe employ the first version to do actual conversion
      }
   });

edited Nov 7, 2017 at 13:20

answered Nov 7, 2017 at 12:49

Timothy Truckle

15.7k2 gold badges31 silver badges51 bronze badges

Collectives™ on Stack Overflow

convert byte array ending with single null byte to UTF16 encoded string

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related