5

I receive from socket a string in a byte array which look like :

[128,5,6,3,45,0,0,0,0,0]

The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.

If i simply do :

String myString = new String(myBuffer); 

I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).

To get the correct size and the correct string i do this :

int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
    byte charac = datasRec[j];
    if(charac == 0)
        break;
    sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label    = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
    label[j] = datasRec[j];
}
String myString = new String(label);

Is there a better way to handle the problem ?

Thanks

6 Answers 6

13

May be its too late, But it may help others. The simplest thing you can do is new String(myBuffer).trim() that gives you exactly what you want.

Sign up to request clarification or add additional context in comments.

Comments

7

0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.

If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:

int size = 0;
while (size < data.length)
{
    if (data[size] == 0)
    {
        break;
    }
    size++;
}

// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");

I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.

If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.

3 Comments

+1 for taking encoding into account. If the stuff received over socket is just a serialized Java String it ought to be okay.
@G_H: "Just a serialized Java String" doesn't really specify what the serialization format is. If the OP were using Java binary serialization he wouldn't be doing this operation explicitly anyway... and if it's some other serialization format, we'd need to know which.
I probably should stop talking... Fact is, I've always stayed the hell away from serialization and don't know the details all that well. JAXB or JPA is usually the only thing I even consider an option.
2

You can always start at the end of the byte array and go backwards until you hit the first non-zero. Then just copy that into a new byte and then String it. Hope this helps:

    byte[] foo = {28,6,3,45,0,0,0,0};
    int i = foo.length - 1;

    while (foo[i] == 0)
    {
        i--;
    }

    byte[] bar = Arrays.copyOf(foo, i+1);

    String myString = new String(bar, "UTF-8");
    System.out.println(myString.length());

Will give you a result of 4.

Comments

2

Strings in Java aren't ended with a 0, like in some other languages. 0 will get turned into the so-called null character, which is allowed to appear in a String. I suggest you use some trimming scheme that either detects the first index of the array that's a 0 and uses a sub-array to construct the String (assuming all the rest will be 0 after that), or just construct the String and call trim(). That'll remove leading and trailing whitespace, which is any character with ASCII code 32 or lower.

The latter won't work if you have leading whitespace you must preserve. Using a StringBuilder and deleting characters at the end as long as they're the null character would work better in that case.

Comments

1

It appears to me that you are ignoring the read-count returned by the read() method. The trailing null bytes probably weren't sent, they are probably still left over from the initial state of the buffer.

int count = in.read(buffer);
if (count < 0)
  ; // EOS: close the socket etc
else
  String s = new String(buffer, 0, count);

2 Comments

The buffer shown in my OP is just an extract of an entire packet. The string is send in the middle of lot of other datas.
@grunk then the protocol must tell you how much of it is the string, either by null-terminating it or a length prefix.
1

Not to dive into the protocol considerations that the original OP mentioned, how about this for trimming the trailing zeroes ?

public static String bytesToString(byte[] data) {
    String dataOut = "";
    for (int i = 0; i < data.length; i++) {
        if (data[i] != 0x00)
            dataOut += (char)data[i];
    }
    return dataOut;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.