How to detect end of string in byte array to string conversion?

Question

I receive from socket a string in a byte array which look like :

[128,5,6,3,45,0,0,0,0,0]

The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.

If i simply do :

String myString = new String(myBuffer);

I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).

To get the correct size and the correct string i do this :

int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
    byte charac = datasRec[j];
    if(charac == 0)
        break;
    sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label    = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
    label[j] = datasRec[j];
}
String myString = new String(label);

Is there a better way to handle the problem ?

Thanks

Yuvi · Accepted Answer · 2013-01-03 08:15:37Z

13

May be its too late, But it may help others. The simplest thing you can do is new String(myBuffer).trim() that gives you exactly what you want.

answered Jan 3, 2013 at 8:15

Yuvi

1,3325 gold badges25 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jon Skeet · Accepted Answer · 2013-04-19 09:57:21Z

7

0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.

If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:

int size = 0;
while (size < data.length)
{
    if (data[size] == 0)
    {
        break;
    }
    size++;
}

// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");

I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.

If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.

edited Apr 19, 2013 at 9:57

answered Nov 4, 2011 at 9:56

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

3 Comments

G_H Over a year ago

+1 for taking encoding into account. If the stuff received over socket is just a serialized Java String it ought to be okay.

Jon Skeet Over a year ago

@G_H: "Just a serialized Java String" doesn't really specify what the serialization format is. If the OP were using Java binary serialization he wouldn't be doing this operation explicitly anyway... and if it's some other serialization format, we'd need to know which.

G_H Over a year ago

I probably should stop talking... Fact is, I've always stayed the hell away from serialization and don't know the details all that well. JAXB or JPA is usually the only thing I even consider an option.

Deco · Accepted Answer · 2011-11-04 10:32:34Z

2

You can always start at the end of the byte array and go backwards until you hit the first non-zero. Then just copy that into a new byte and then String it. Hope this helps:

    byte[] foo = {28,6,3,45,0,0,0,0};
    int i = foo.length - 1;

    while (foo[i] == 0)
    {
        i--;
    }

    byte[] bar = Arrays.copyOf(foo, i+1);

    String myString = new String(bar, "UTF-8");
    System.out.println(myString.length());

Will give you a result of 4.

answered Nov 4, 2011 at 10:32

Deco

3,32120 silver badges25 bronze badges

Comments

G_H · Accepted Answer · 2016-04-15 05:21:40Z

2

Strings in Java aren't ended with a 0, like in some other languages. 0 will get turned into the so-called null character, which is allowed to appear in a String. I suggest you use some trimming scheme that either detects the first index of the array that's a 0 and uses a sub-array to construct the String (assuming all the rest will be 0 after that), or just construct the String and call trim(). That'll remove leading and trailing whitespace, which is any character with ASCII code 32 or lower.

The latter won't work if you have leading whitespace you must preserve. Using a StringBuilder and deleting characters at the end as long as they're the null character would work better in that case.

edited Apr 15, 2016 at 5:21

answered Nov 4, 2011 at 9:56

G_H

12.1k3 gold badges43 silver badges85 bronze badges

Comments

user207421 · Accepted Answer · 2011-11-04 11:04:28Z

1

It appears to me that you are ignoring the read-count returned by the read() method. The trailing null bytes probably weren't sent, they are probably still left over from the initial state of the buffer.

int count = in.read(buffer);
if (count < 0)
  ; // EOS: close the socket etc
else
  String s = new String(buffer, 0, count);

answered Nov 4, 2011 at 11:04

user207421

312k45 gold badges324 silver badges493 bronze badges

2 Comments

grunk Over a year ago

The buffer shown in my OP is just an extract of an entire packet. The string is send in the middle of lot of other datas.

user207421 Over a year ago

@grunk then the protocol must tell you how much of it is the string, either by null-terminating it or a length prefix.

vortal · Accepted Answer · 2014-05-16 12:53:34Z

1

Not to dive into the protocol considerations that the original OP mentioned, how about this for trimming the trailing zeroes ?

public static String bytesToString(byte[] data) {
    String dataOut = "";
    for (int i = 0; i < data.length; i++) {
        if (data[i] != 0x00)
            dataOut += (char)data[i];
    }
    return dataOut;
}

answered May 16, 2014 at 12:53

vortal

1562 silver badges11 bronze badges

Collectives™ on Stack Overflow

How to detect end of string in byte array to string conversion?

6 Answers 6

Comments

3 Comments

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

3 Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related