Ruby - Unpack array with mixed types

Question

I am trying to use unpack to decode a binary file. The binary file has the following structure:

ABCDEF\tFFFABCDEF\tFFFF....

where

ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times

I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do

s.unpack('F*')

Or if I had integers and floats like

[1, 3.4, 5.2, 4, 2.3, 7.8]

I would do

s.unpack('CF2CF2')

But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.

I need to use Ruby 2.0.0-p247 if that matters

Example

ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')

then

s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 @ff\x0EAff"]

Finally

s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'

The P7 is the issue, try changing just to lowercase p (no 7). There is some differences when packing/unpacking. When reading the file, you use the P7 because it is not null-terminates, but when packing it again, it is. I just used the example without error by packing with P7fffP7fff and unpacking with pfffpfff. — ForeverZer0
– ForeverZer0, Commented Jun 12, 2019 at 6:16
Your example uses an array where each item is already separated, so you will be using lower p. When reading the file, it is going to be a string of bytes without a being separated into array items, so you must specify the fixed length with the uppercase variant P7. — ForeverZer0
– ForeverZer0, Commented Jun 12, 2019 at 6:25
OK. I will try tonight when I go back home and get access to the file. — Rojj
– Rojj, Commented Jun 12, 2019 at 6:56

Eric Duminil · Accepted Answer · 2019-06-12 17:21:19Z

2

You could read the file in small chunks of 19 bytes and use 'A7fff' to pack and unpack. Do not use pointers to structure ('p' and 'P'), as they need more than 19 bytes to encode your information. You could also use 'A6xfff' to ignore the 7th byte and get a string with 6 chars.

Here's an example, which is similar to the documentation of IO.read:

data = [["ABCDEF\t", 3.4, 5.6, 9.1], 
        ["FEDCBA\t", 2.5, 8.9, 3.1]]
binary_file = 'data.bin'
chunk_size = 19
pattern = 'A7fff'

File.open(binary_file, 'wb') do |o|
  data.each do |row|
    o.write row.pack(pattern)
  end
end

raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size

File.open(binary_file, 'rb') do |f|
  while record = f.read(chunk_size)
    puts '%s %g %g %g' % record.unpack(pattern)
  end
end
# =>
#    ABCDEF   3.4 5.6 9.1
#    FEDCBA   2.5 8.9 3.1

You could use a multiple of 19 to speed up the process if your file is large.

edited Jun 12, 2019 at 17:21

answered Jun 12, 2019 at 8:33

Eric Duminil

54.6k10 gold badges80 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rojj Over a year ago

A7fff did the trick. Even without writing to file I can unpack it with s.scan(/.{19}/).map{ |item| item.unpack('A7fff') }

Eric Duminil Over a year ago

@Rojj: Sure, you don't have to write anything if you already have your data. It was just to have a common binary data to debug and test. scan also works, but it needs to have the whole file in memory, which might not be suitable if you work with large files.

Eric Duminil Over a year ago

@Rojj: If you don't care about the last character of the string, you could also use 'A6xfff', as in ["ABCDEF\t", 3.4, 5.6, 9.1].pack('A7fff').unpack('A6xfff')

Rojj Over a year ago

Oh that's great! Thanks

ForeverZer0 · Accepted Answer · 2019-06-12 06:37:17Z

0

When dealing with mixed formats that repeat, and are of a known fixed size, it is often easier to split the string first,

Quick example would be:

binary.scan(/.{LENGTH_OF_DATA}/).map { |item| item.unpack(FORMAT) }

Considering your above example, take the length of the string including the tab character (in bytes), plus the size of a 3 floats. If your strings are literally 'ABCDEF\t', you would use a size of 19 (7 for the string, 12 for the 3 floats).

Your final product would look like this:

str.scan(/.{19}/).map { |item| item.unpack('P7fff') }

Per example:

irb(main):001:0> ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
=> ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]

irb(main):002:0> s = ary.pack('pfffpfff')
=> "\xE8Pd\xE4eU\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11A\x98Pd\xE4eU\x00\x00\x00\x00 @ff\x0EAffF@"

irb(main):003:0> s.unpack('pfffpfff')
=> ["ABCDEF\t", 3.4000000953674316, 5.599999904632568, 9.100000381469727, "FEDCBA\t", 2.5, 8.899999618530273, 3.0999999046325684]

The minor differences in precision is unavoidable, but do not worry about it, as it comes from the difference of a 32-bit float and 64-bit double (what Ruby used internally), and the precision difference will be less than is significant for a 32-bit float.

edited Jun 12, 2019 at 6:37

answered Jun 12, 2019 at 5:32

ForeverZer0

2,5301 gold badge28 silver badges36 bronze badges

10 Comments

Rojj Over a year ago

Beautiful, but I have a problem. I read the string from the file with File.binread'. This gives me a String` and String does not have the method each_slice. I tried to convert it to bytes or chars, but this gives me arrays and unpack does not work on arrays. Does each_slice work on string for Ruby 2.0.0?

ForeverZer0 Over a year ago

My apologies, I corrected answer to use String#scan instead of each_slice. You could alternatively use str.chars.each_slice, but scan is a cleaner approach IMO.

Rojj Over a year ago

I get no associated pointer. I have added an example so we look at exactly the same thing.

cremno Over a year ago

p / P are not the correct directive here. Use a as in a6x (x to ignore tab character) as P7 means seven pointers to Ruby strings.

ForeverZer0 Over a year ago

A lowercase p means that, but for uppercase P, it indicates the size of the struct. Demonstrated easily with ['stack', 'overflow'].pack('pp').unpack('P5P8'). It does not try to unpack 13 pointers.

|

Collectives™ on Stack Overflow

Ruby - Unpack array with mixed types

2 Answers 2

4 Comments

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related