0

I'm trying to develop an Android app, where I need to add an byte value (8-bit) inside a string and read it back again in byte[].

However I'm getting some different value when I convert the string to byte[] again using getBytes(). I think its some encoding or charset issue.

BTW I'm new to java programming I mostly code in C.

Code:

void function(void)
{   
    String a = "bla";
    char x = (0xD0 & 0xFF);  //Need to add & read back '0xD0'
    a += x;
    Log.d(TAG,"TEST: "+a);

    String mm = "-- ";
    byte[] buffer = null;
    try {
        buffer = a.getBytes("US-ASCII");
    } catch (UnsupportedEncodingException e) {
        Log.e(TAG, e.getMessage());
    }
    for (int i = 0; i < buffer.length; i++) {
        mm+=" "+Integer.toHexString( buffer[i] );
    }

    Log.e(TAG, "Len:"+buffer.length+mm);
}

Output:

TEST: bla￐
Len:4--  62 6c 61 3f

Expected:

Len:4--  62 6c 61 d0

Found the solution: Now I'm using encoding UTF-16LE, which does not loose data, and transmitting even bytes, skip odd bytes

Solution:

void function(void)
{   
    String a = "bla";
    char x = 0xD0;
    a += x;
    Log.d(TAG,"TEST: "+a);

    String mm = "-- ";
    byte[] buffer = null;
    try {
        buffer = a.getBytes("UTF-16LE");
    } catch (UnsupportedEncodingException e) {
        Log.e(TAG, e.getMessage());
    }
    for (int i = 0; i < buffer.length; ) {
        mm += i +":"+Integer.toHexString( buffer[i] ) + ",";
        /* Skip odd bytes as using "UTF-16LE" encoding */
        i+=2;
    }

    Log.e(TAG, "Len:"+buffer.length+mm);
}

Result:

Len:8-- 0:62,2:6c,4:61,6:ffffffd0,
5
  • bytes are signed -128 to 127, chars are positive 0 to 65535. You are probably adding char 0xFFD0 to your String, due to sign extension. Commented Apr 15, 2016 at 13:55
  • Yes but casting as (byte)x gives Len:6-- 62 6c 61 2d 34 38, and by anding with 0xff i am converting it to unsigned. I'm not getting how 3f is getting there. Commented Apr 15, 2016 at 14:03
  • "0xD0 & 0xFF" is "0xD0". "(byte)(0xD0 & 0xFF)" is -48. Commented Apr 15, 2016 at 14:08
  • Thanks, but without casting it gives error. And changing byte to char gives same result Commented Apr 15, 2016 at 14:19
  • 1
    What you are doing is inherently wrong. Strings are not arrays of bytes; they are arrays of characters, which are 2 bytes. Your getBytes() at the end is converting 2 byte characters into 1 byte ... well ... bytes. There is a loss of information that occurs at that step, if characters are outside the US-ASCII range. If you want to accumulate bytes, you should use a ByteBuffer or similar structure. JSYK: Your buffer result 62 6c 61 2d 34 38 is "bla-48" ... the byte was converted to a string and appended. Commented Apr 15, 2016 at 14:45

2 Answers 2

2

As java was created, they made a distinction between binary data (byte[], InputStream, OutputStream) and Unicode text (String, char, Reader, Writer). This means that byte is 8 bits and char is 16 bits, containing UTF-16. Now UTF-16 encodes Unicode text - where the characters ("code points") are numbered into the 3 byte range - in a format sometimes needing two 2-byte chars. All in all you cannot use arbitrary chars. And the implicit or explicit conversion always is costly.

Better use ByteArrayInputStream and ByteArrayOutputStream which can be used to collect a variable number of bytes and then retrieve the byte[].

If still the wish exists, you can convert from a 1 byte encoding like ISO-8859-1.

String s = "blah\u00d0";
String s = "blah" + '\u00d0';
String s = "blah" + ((char)0x00d0); // < 0x100 still in safe range
byte[] b = s.getBytes("ISO-8859-1");
s = new String(b, "ISO-8859-1");

The other difference with C is that \u0000 is a normal character in a String.

Sign up to request clarification or add additional context in comments.

Comments

0

How about this:

String s = "Hello";
s += (char)((byte)0xD0 & 0xFF);

s == "HelloÐ"

Integer.toHexString(s.charAt(s.length()-1) & 0xff) == "d0"

To specify encoding while getting the bytes you can do s.getBytes("UTF-8"); if you need a specific encoding to send over the network or wherever

And you can create a new String from encoded bytes like this String s = new String(utfByteArray, "UTF-8");

2 Comments

Thanks, but I'm basically trying to get D0 in byte[] array not -48, basically I need to transfer D0 as single byte over serial port. I tried as you said but still getting 3f
Just updated my answer (I'm new to this site idk if it tells you)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.