2

I am trying to use unpack to decode a binary file. The binary file has the following structure:

ABCDEF\tFFFABCDEF\tFFFF....

where

ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times

I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do

s.unpack('F*')

Or if I had integers and floats like

[1, 3.4, 5.2, 4, 2.3, 7.8]

I would do

s.unpack('CF2CF2')

But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.

I need to use Ruby 2.0.0-p247 if that matters

Example

ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')

then

s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 @ff\x0EAff"]

Finally

s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'
5
  • The P7 is the issue, try changing just to lowercase p (no 7). There is some differences when packing/unpacking. When reading the file, you use the P7 because it is not null-terminates, but when packing it again, it is. I just used the example without error by packing with P7fffP7fff and unpacking with pfffpfff. Commented Jun 12, 2019 at 6:16
  • I get the same error Commented Jun 12, 2019 at 6:18
  • Your example uses an array where each item is already separated, so you will be using lower p. When reading the file, it is going to be a string of bytes without a being separated into array items, so you must specify the fixed length with the uppercase variant P7. Commented Jun 12, 2019 at 6:25
  • OK. I will try tonight when I go back home and get access to the file. Commented Jun 12, 2019 at 6:56
  • @ForeverZer0: Both p and P are the issue. Commented Jun 12, 2019 at 8:47

2 Answers 2

2

You could read the file in small chunks of 19 bytes and use 'A7fff' to pack and unpack. Do not use pointers to structure ('p' and 'P'), as they need more than 19 bytes to encode your information. You could also use 'A6xfff' to ignore the 7th byte and get a string with 6 chars.

Here's an example, which is similar to the documentation of IO.read:

data = [["ABCDEF\t", 3.4, 5.6, 9.1], 
        ["FEDCBA\t", 2.5, 8.9, 3.1]]
binary_file = 'data.bin'
chunk_size = 19
pattern = 'A7fff'

File.open(binary_file, 'wb') do |o|
  data.each do |row|
    o.write row.pack(pattern)
  end
end

raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size

File.open(binary_file, 'rb') do |f|
  while record = f.read(chunk_size)
    puts '%s %g %g %g' % record.unpack(pattern)
  end
end
# =>
#    ABCDEF   3.4 5.6 9.1
#    FEDCBA   2.5 8.9 3.1

You could use a multiple of 19 to speed up the process if your file is large.

Sign up to request clarification or add additional context in comments.

4 Comments

A7fff did the trick. Even without writing to file I can unpack it with s.scan(/.{19}/).map{ |item| item.unpack('A7fff') }
@Rojj: Sure, you don't have to write anything if you already have your data. It was just to have a common binary data to debug and test. scan also works, but it needs to have the whole file in memory, which might not be suitable if you work with large files.
@Rojj: If you don't care about the last character of the string, you could also use 'A6xfff', as in ["ABCDEF\t", 3.4, 5.6, 9.1].pack('A7fff').unpack('A6xfff')
Oh that's great! Thanks
0

When dealing with mixed formats that repeat, and are of a known fixed size, it is often easier to split the string first,

Quick example would be:

binary.scan(/.{LENGTH_OF_DATA}/).map { |item| item.unpack(FORMAT) }

Considering your above example, take the length of the string including the tab character (in bytes), plus the size of a 3 floats. If your strings are literally 'ABCDEF\t', you would use a size of 19 (7 for the string, 12 for the 3 floats).

Your final product would look like this:

str.scan(/.{19}/).map { |item| item.unpack('P7fff') }

Per example:

irb(main):001:0> ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
=> ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]

irb(main):002:0> s = ary.pack('pfffpfff')
=> "\xE8Pd\xE4eU\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11A\x98Pd\xE4eU\x00\x00\x00\x00 @ff\x0EAffF@"

irb(main):003:0> s.unpack('pfffpfff')
=> ["ABCDEF\t", 3.4000000953674316, 5.599999904632568, 9.100000381469727, "FEDCBA\t", 2.5, 8.899999618530273, 3.0999999046325684]

The minor differences in precision is unavoidable, but do not worry about it, as it comes from the difference of a 32-bit float and 64-bit double (what Ruby used internally), and the precision difference will be less than is significant for a 32-bit float.

10 Comments

Beautiful, but I have a problem. I read the string from the file with File.binread'. This gives me a String` and String does not have the method each_slice. I tried to convert it to bytes or chars, but this gives me arrays and unpack does not work on arrays. Does each_slice work on string for Ruby 2.0.0?
My apologies, I corrected answer to use String#scan instead of each_slice. You could alternatively use str.chars.each_slice, but scan is a cleaner approach IMO.
I get no associated pointer. I have added an example so we look at exactly the same thing.
p / P are not the correct directive here. Use a as in a6x (x to ignore tab character) as P7 means seven pointers to Ruby strings.
A lowercase p means that, but for uppercase P, it indicates the size of the struct. Demonstrated easily with ['stack', 'overflow'].pack('pp').unpack('P5P8'). It does not try to unpack 13 pointers.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.