6

I am parsing in Java a byte array having the following specification:

Trace data format:
    - 4 bytes containing the Id.
    - 4 bytes containing the address.
    - N bytes containing the first name, where 0 < N < 32
    - N bytes containing the last name, where 0 < N < 32
    - 4 bytes containing the Minimum
    - 4 bytes containing the Maximum 
    - 4 bytes containing the Resource Default Level

Today I don't see any solution to parse this array in order to get 7 variable with the correct type. Do you confirm or Am I missing something like a magic function in Java allowing to find String "limits" in a byte array (I can't see how the Minimum value can be distincted from its associated ASCII character).

Is there any "convention" about a special character between the 2 strings ?

7
  • How do you know when you've read the whole name? Commented Feb 24, 2011 at 15:16
  • Could they be null-terminated strings? Commented Feb 24, 2011 at 15:19
  • 4
    Could you provide a better title for your question? The current one could be applied to most of the questions on SO. Commented Feb 24, 2011 at 15:20
  • Please change title to "Parsing string or array in java" or similar. Commented Feb 24, 2011 at 15:21
  • 2
    Null-terminated strings are strings which end with the '\0' character. They are a standard string format often seen in C/C++ systems. Commented Feb 24, 2011 at 15:26

5 Answers 5

12

Well, you know that the first name starts at byte 9, and that the last name ends at byte (lenght-13). What is uncertain is how to find where the first name ends and the last name begins. I see a few possible soutions:

  • If the format was defined by a C programmer, the two name fields are most likely terminated by a null byte, since that's the C convention for strings.
  • If it was defined by a Java programmer, it could be written by writeUTF(), which means that the specification of the byte count is most likely wrong. However, this at least specifies the encoding, which is otherwise an open question.
  • If it was defined by a COBOL programmer, the two fields could be fixed-length and padded with zeroes or spaces, with the format specification listing the payload length rather than the field length.
  • If it was defined by a really incompetent programmer (whatever language), it contains the two names without delimiter or count, so it's not possible to realiably separate them (if you don't have the information, there's no "magic" function in Java or elsewhere that can conjure it out of thin air). I suppose you could hope the last name always starts with an uppercase letter and nobody uses double names or all-caps.
Sign up to request clarification or add additional context in comments.

Comments

5

Is there any "convention" about a special character between the 2 strings ?

Well c-strings are often null-terminated \0.

If there is no such character I would say that it is impossible to parse the structure.

Comments

3

Assuming the first and last name are null-terminated you would do it like this:

int firstNameLength = 0;
while(firstNameLength<32) {
    if(theArray[firstNameLength]=='0') break;
    firstNameLength++;
}
int lastNameLength = 0;
while(lastNameLength<32) {
    if(theArray[8+firstNameLength+1+lastNameLength]=='0') break;
    i++;
}
String firstName = new String(theArray).substring(8,8+firstNameLength);
String lastName = new String(theArray).substring(8+firstNameLength+1,8+firstNameLength+1+lastNameLength);

Comments

2

if you want to read N ASCII bytes and turn them into a String.

public static String readString(DataInputStream dis, int num) throws IOException {
    byte[] bytes = new byte[num];
    dis.readFully(bytes);
    return new String(bytes, 0);
}

For the rest of the values, you can use

dis.readInt();

If you are asking if there is any way to know how long the strings are, I don't believe you can determine this from the information provided. Perhaps the strings are '0' byte terminated or have the length as the first byte. Perhaps if you look at the bytes in the file you will see what the format is.

od -xc my-format.bin

1 Comment

He needs to do more investigation as to work this out. He doesn't have enough information in the original question.
0

Just to add another possibility for Michael's answer.

Assuming that N is the same for both fields, and since the same letter is used I would guess that this is the case, the field positions would be like this:

int len = array.length;
int varLen = len - 5*4;
int fieldPos[] = new int[7];
fieldPos[0] = 0;
fieldPos[1] = 4;
fieldPos[2] = 8;
fieldPos[3] = 8 + varLen;
fieldPos[4] = 8 + 2*varLen;
fieldPos[5] = 8 + 2*varLen + 4;
fieldPos[6] = 8 + 2*varLen + 8;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.