3

I want to split data to chunks of let's say 8154 byte:

data = Zlib::Deflate.deflate(some_very_long_string)

What would be the best way to do that?

I tried to use this:

chunks = data.scan /.{1,8154}/

...but data was lost! data had a size of 11682, but when looping through every chunk and summing up the size I ended up with a total size of 11677. 5 bytes were lost! Why?

2 Answers 2

5

Regexps are not a good way to parse binary data. Use bytes and each_slice to operate bytes. And use pack 'C*' to convert them back into strings for output or debug:

irb> data = File.open("sample.gif", "rb", &:read)
=> "GIF89a\r\x00\r........."

irb> data.bytes.each_slice(10){ |slice| p slice, slice.pack("C*") }
[71, 73, 70, 56, 57, 97, 13, 0, 13, 0]
"GIF89a\r\x00\r\x00"
[247, 0, 0, 0, 0, 0, 0, 0, 51, 0]
"\xF7\x00\x00\x00\x00\x00\x00\x003\x00"
...........
Sign up to request clarification or add additional context in comments.

1 Comment

Isn't this going to be very slow?
1

The accepted answer works, but creates unneeded arrays and is extremely slow for big files.

This alternative works fine and is much faster (500x for a 1MB file and 10kB chunks!) :

def get_binary_chunks(string, size)
  Array.new(((string.length + size - 1) / size)) { |i| string.byteslice(i * size, size) }
end

For the given example, you'd use it this way :

chunks = get_binary_chunks(data, 8154)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.