0

Ok. I have spent the last 14 hours trying to figure this out. I have a binary file with the following contents - (much more, but this is truncated version). I wish to convert this to readable string format.

^@^P<9A>^@^@^A^@^@И^@^@^A^@^@Κ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^F<9A>^@^@^@^@^@^@^C^@FQ]U:^@^M^@ ^B^@^E^@^@^@`ESC^B^@d^@^@^@^T^R^B^@^E^@^@^@^@^@^@^@^T^R^B^@^@^@^@^@^@^@^@^@^K^B^@^@^@^@^@^C^@HQ]U:^@^S^@^@^@(^@^@^@V^@^@2^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@HQ]U:^@^V^@<8C>I^B^@^E^@^@^@O^B^@ ^@^@^@O^B^@^E^@^@^@^@^@^@^@O^B^@^@^@^@^@^@^@^@^@^RK^B^@^@^@^@^@^C^@HQ]U:^@^Y^@0^A^@d^@^@^@1^A^@<96>^@^@^@L0^A^@d^@^@^@^@^@^@ ^@71^A^@^@^@^@^@^@^@^@^@0^A^@^@^@^@^@^C^@=Q]U:^@"^@<92>T^@^@2^@^@^@CN^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^AT^@^@ ^@^@^@^@^C^@FQ]U:^@(^@$^M^A^@ ^@^@^@^G^A^@2^@^@^@^O^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@R^L^A^@^@^@^@^@^C^@=Q]U:^@.^@<85>^B ^@^@^G^@^@g^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<85>^B^@^@^@^@^@^@^C^@HQ]U:^@4^@^CH^@^@^Y^@^@^@G^@^@d^@^@^@ H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^CH^@^@^@^@^@^@^C^@HQ]U:^@O^@^M^@^@<89>^@^@^@^G^M^@^@^A^@^@^P^N^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^M^@^@^@^@^@^@^C^@HQ]U:^@R^@^B^@^@^A^@^@^B^@^@<8C>0^B^@^B^@^@^A^@^@^@^@^@^@^B^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^C^@HQ]U:^@d^@F^A^@ ^@^@^@^TJ^A^@ ^@^@^@<98>M^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<8A>G^A^@^@^@^@^@^C^@HQ]U:^@y^@j;^@^@^A^@^@^@=;^@^@d^@^@^@(<^@^@^C^@^@^@^@^@^@P<^@^@^@^@^@^@^@^@^@^@=;^@^@^@^@^@^@^C^@FQ]U:^@<88>^@&^@^@^A^@^@^@&^@^@d^@^@^@'^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@&^@^@ ^@^@^@^@^C^@FQ]U:^@<94>^@^H^@^@^@^@^@^H^@^@d^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^@^@^@^@^@^@^C^@HQ]U:^@<9A>^@w^@^@^A^@^@^@\^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Z^@^@^@^@^@^@^C^@HQ]U:^@<9D>^@^A^B^@ ^@^@^@^A^B^@^A^@^@^@^A^@ ^@^@^@^@^@^@^@"^A^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^C^@HQ]U:^@^@4I^@^@^A^@^@^@DH^@^@^A^@^@^@^]B^@^@<9E>^@^@^@^@^@^@^@I^@^@^@^@^@^@^@^@^@^@MI^@^@^@^@^@^@^C^@FQ]U:^@^@y^@^@^A^@^@^@^Xy^@^@^A^@^@^@]a^@^@^C^@^@^@^@^@^@^@Px^@^@^@^@^@^@^@^@^@^@wy^@^@^@^@^@^@^C^@HQ]U: ^@^@V^^@^@^T^@^@^@^^@^@^A^@^@^@ZU^@^@e^@^@^@^@^@^@^@$^^@^@^@^@^@^@^@^@^@^@^^@^@^@^@^@^@^C^@DQ]U:^@^@DESC^@^@^A^@^@XESC^@ ^@^A^@^@^@<84>^\^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<80>ESC^@^@^@^@^@^@^C^@HQ]U:^@^@ESC^@^@2^@^@^@ESC^@^@d^@^@^@ESC^@^@^@^@^@^@^@^@^@ESC^@^@^@^@^@^@^@^@^@^@ESC^@^@^@^@^@^@^C^@HQ]U:^@^@<8B>-^A^@^@^@^@@-^A^@<^@^@^@,^A^@@^@^@^@^@^@^@^@@-^A^@^@^@^@^@^@^@^@^@@-^A^@^@^@^@^@^C^@HQ]U:^@^@<86>^A^@^@U^@^@<86>^A^@^@@<9C>^@^@<90>^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ <86>^A^@^@^@^@^@^@^C^@FQ]U:^@^G^A^T^A^@ ^@^@^@Y^A^@Q^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^P^A^@^@^@^@^@^C^@HQ]U:^@^S^A^^^B^@^A^@^@^@<80>2^B^@ ^@^@^@^O^B^@n^@^@^@^@^@^@^@^^B^@^@^@^@^@^@^@^@^@^_^B^@^@^@^@^@^C^@DQ]U:^@^V^A4^A^@^@^@^@^P!^A^@K^@^@^@8D^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@*^A^@^@^@^@^@^C^@?Q]U:^@.^Aw^F^@^@^A^@^@^@h^F^@^@^O^@^@b^G^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@w^F^@^@^@^@^@^@^C^@HQ]U:^@1^A^A^@^A^@^@^@^A^@^\^B^@^@X^O^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^C^@HQ]U:^@4^A^X^F^@^@^G^@^@x^E^@^@^Z^D^@^@@^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^X^F^@^@^@^@^@^@^C^@FQ]U:^@=^A^L^F^@^A^@^@^@\^F^@^G^@^@^@X^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@S^F^@^@^@^@^@^C^@=Q]U:^@O^A^P!^A^@^@^@^@^@^@^@^@^@^A^@^@^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ "^A^@^@^@^@^@^C^@BQ]U:^@R^AX^@^@^Y^@^@^@^@^@^E^@^@^@x^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@HQ]U:^@U^A^R^Q^@^@^A^@^@^@^P^@^@2^@^@^@^P^@^@^A^@^@^@^@^@^@^@^P^@^@^@^@^@^@^@^@^@^@^H^Q^@^@^@^@^@^@^C^@@Q]U:^@^^An^A^@^A^@^@^@pM ^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@hn^A^@^@^@^@^@^C^@HQ]U:^@p^A<9D>^A^@^B^@^@^@d<90>^A^@^E^@^@^@<90>^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^R<9C>^A^@^@^@^@^@^C^@HQ]U:^@s^A^A^@^Y^@^@^@ȩ^A^@^T^@^@^@а^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@y^A^@^@^@^@^@^C^@HQ]U:^@|^A<8E>^@^@^A^@^@M<9E>^@^@d^@^@^@<^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

I have the template definition for this file as follows -

HEADER Transcode Short 2 Bytes, Timestamp Long 4 Bytes, Message Length Short 2 Bytes, (Total 8 Bytes)

DATA Security Token Short 2 Bytes, Last Traded Price Long 4 Bytes, Best Buy Quantity Long 4 Bytes, Best Buy Price Long 4 Bytes, Best Sell Quantity Long 4 Bytes, Best Sell Price Long 4 Bytes, Total Traded Quantity Long 4 Bytes, Average Traded Price Long 4 Bytes, Open Price Long 4 Bytes, High Price Long 4 Bytes, Low Price Long 4 Bytes, Close Price Long 4 Bytes, Filler Long 4 Bytes (Blank), (Total 50 Bytes)

I tried perl's pack, unpack, ord, reading byte by byte, getting rid of those "^@" and trying to make sense of what is remaining, seeming to be Hex code, but I am not able to make this readable in ASCII strings via perl. I also tried raw, encoding, decoding and even searched stackoverflow thoroughly. There were few problems in the same league but none of those guys shared the template to decode it back. I have it but still can't figure it out.

There is something basic that I am missing but can't really point out. Would really appreciate if someone can show me step by step with code how this conversion is supposed to be done.

Have never done this before ...

$ xxd 1.bin
0000000: 0300 3b51 5d55 3a00 0700 f87f 0000 0100  ..;Q]U:.........
0000010: 0000 587f 0000 0100 0000 6b67 0000 0100  ..X.......kg....
0000020: 0000 0000 0000 587f 0000 0000 0000 0000  ......X.........
0000030: 0000 e880 0000 0000 0000 0300 4851 5d55  ............HQ]U
0000040: 3a00 0a00 109a 0000 f401 0000 d098 0000  :...............
0000050: f401 0000 ce9a 0000 0000 0000 0000 0000  ................
0000060: 0000 0000 0000 0000 0000 0000 069a 0000  ................
0000070: 0000 0000 0300 4651 5d55 3a00 0d00 a80a  ......FQ]U:.....
0000080: 0200 0500 0000 601b 0200 6400 0000 1412  ......`...d.....
0000090: 0200 0500 0000 0000 0000 1412 0200 0000  ................
00000a0: 0000 0000 0000 ac0b 0200 0000 0000 0300  ................
00000b0: 4851 5d55 3a00 1300 f8f2 0000 2800 0000  HQ]U:.......(...
00000c0: 56c2 0000 3200 0000 fbf9 0000 0000 0000  V...2...........
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000e0: a3f2 0000 0000 0000 0300 4851 5d55 3a00  ..........HQ]U:.
00000f0: 1600 8c49 0200 0500 0000 cc4f 0200 0a00  ...I.......O....
0000100: 0000 cc4f 0200 0500 0000 0000 0000 cc4f  ...O...........O
0000110: 0200 0000 0000 0000 0000 124b 0200 0000  ...........K....

Still doesn't make much sense.

3
  • 2
    Put your effort here so we can look at where is wrong. Commented May 23, 2015 at 18:16
  • Where did you get that display from? It is much more useful to display binary content in hex Commented May 23, 2015 at 18:24
  • Ok. I just stupidly did a cat ... how do you see binary content in hex? xxd ... I see .. Commented May 23, 2015 at 18:26

1 Answer 1

1

The biggest issue I see is that there's more to unpacking binary data than just knowing "short" or "long".

For numeric values, you need to specify whether or not the byte data is in little or big endian order. You also need to know whether or not you're dealing with signed or unsigned values.

For this example, I'm just going to assume everything is in little endian and unsigned: which is probably wrong, but it's up to you to tweak the pack templates once you find it out. In case you need a link, try http://perldoc.perl.org/functions/pack.html

I haven't tested this on my machine, so pardon if there are any errors, but this is roughly how I would go about what you're trying to do.

#!/usr/bin/perl
use strict;

$/ = undef; #may not be necessary, I haven't tested this
open IN, "path/to/file.ext"; #open the file for reading

read(IN,my $raw_header, 8); #read 8 bytes off of the file into $raw_header

my @header = unpack("vVv", $raw_header); #unpack header into array

read(IN,my $raw_data, 50); #similar

my @data = unpack("vVVVVVVVVVVVV", $raw_data); #"vV12" is also acceptable, assuming everything is little endian and unsigned

print join "\n", @header, @data; #print all the values in order on their own lines.
Sign up to request clarification or add additional context in comments.

6 Comments

yeah ... i guess long and short is not such a big issue for the template is clearly specifying it. signed / unsigned they didn't bother mentioning ... anyhow ... let me try your solution.
Ok. That did give me an output of - $ ./sug1.pl 3 1432179003 58 So it makes sense. You got the header ... I guess the timestamp is unix number which converts to epoch (05/21/2015 @ 3:30am (UTC)) and message length is offcourse header + data = 58. But I am still did not get any output for actual data ... let me tweak few things and see ...
oops! I made a typo in the second unpack statement and forgot to specify what I was unpacking! Try again.
No. i already caught that but it isn't making proper sense ... ./sug1.pl 3 1432179003 58 7 32760 1 32600 1 26475 1 0 32600 0 0 33000 0
All I can really suggest without seeing any further documentation regarding the dataset is to try different types of unpack characters for shorts and longs. "n" and "N" are the ones for big-endian byte order. You could also try "s" and "l" for shorts and longs as well. I learned most of what I know about byte unpacking by simply messing around with the different codes and seeing what happened. Anyway, best of luck!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.