Parsing binary data into ctypes Structure object via readinto()

Question

I'm trying to handle a binary format, following the example here:

http://dabeaz.blogspot.jp/2009/08/python-binary-io-handling.html

>>> from ctypes import *
>>> class Point(Structure):
>>>     _fields_ = [ ('x',c_double), ('y',c_double), ('z',c_double) ]
>>>
>>> g = open("foo","rb") # point structure data
>>> q = Point()
>>> g.readinto(q)
24
>>> q.x
2.0

I've defined a Structure of my header and I'm trying to read data into my structure, but I'm having some difficulty. My structure is like this:

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ]

The ctypes documentation says:

For integer type fields like c_int, a third optional item can be given. It must be a small positive integer defining the bit width of the field.

So for ("more_funky_numbers_7bytes", c_uint, 56), I've tried to define the field as a 7 byte field, but I'm getting the error:

ValueError: number of bits invalid for bit field

So my first problem, is how can I define a 7 byte int field?

Then If I skip that problem and comment out the "more_funky_numbers_7bytes" field, the resulting data get's loaded in.. but as expected only 1 character is loaded into "ascii_text_32bytes". And for some reason returns 16 which I assume is the calculated number of bytes it read into the structure... but If I'm commenting out my "funky number" field and ""ascii_text_32bytes" is only giving one char (1 byte), shouldn't that be 13, not 16???

Then I tried breaking out the char field into a separate structure, and reference that from within my Header structure. But that's not working either...

class StupidStaticCharField(BigEndianStructure):
    _fields_ = [
                ("ascii_text_1", c_byte),
                ("ascii_text_2", c_byte),
                ("ascii_text_3", c_byte),
                ("ascii_text_4", c_byte),
                ("ascii_text_5", c_byte),
                ("ascii_text_6", c_byte),
                ("ascii_text_7", c_byte),
                ("ascii_text_8", c_byte),
                ("ascii_text_9", c_byte),
                ("ascii_text_10", c_byte),
                ("ascii_text_11", c_byte),
                .
                .
                .
                ]

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", StupidStaticCharField),
                ("timestamp_4bytes", c_uint),
                #("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_ushort),
                ("other_flags_1byte", c_ushort),
                ("payload_length_2bytes", c_ushort),

                ]

So, any ideas how to:

Define a 7 byte field (which I'll need to decode using a defined function)
Define a static char field of 32 bytes

UPDATE

I've found a structure that seems to work...

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char * 32),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_byte * 7),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ]

Now, however, my remaining question is, why when use .readinto():

f = open(binaryfile, "rb")

mystruct = BinaryHeader()
f.readinto(mystruct)

It's returning 52 and not the expected, 51. Where is that extra byte coming from, and where does it go?

UPDATE 2 For those interested here's an example of an alternative struct method to read values into a namedtuple mentioned by eryksun:

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)

If you look in your binaryfile with some Hex Editor, do you see 51 bytes? Also, what does len(mystruct) say? — Ofir Israel
– Ofir Israel, Commented Aug 30, 2013 at 15:36
Yes, the binaryfile is over 50KB. len(mystruct) doesn't seem to work, but sizeof(mystruct) does return 52... — monkut
– monkut, Commented Aug 30, 2013 at 16:18
You can add _pack_ = 1 to the definition, but consider using the struct module with a namedtuple instead. — Eryk Sun
– Eryk Sun, Commented Aug 30, 2013 at 16:54
Thanks, I'll look into using pack. I was trying to move away from struct, because I had trouble figuring out how to get 7 bytes. I was pulling it out as 7s converting it to a string, and then having to convert it back for the decoding process. I was hoping this approach would be a bit cleaner/faster. — monkut
– monkut, Commented Aug 31, 2013 at 1:25
You might want to use the mmap module to map a section of the file (i.e. a given offset & size) into the process address space. Then you can create an array of records using (BinaryHeader * N).from_buffer(mapped_file). This skips using readinto in a loop. — Eryk Sun
– Eryk Sun, Commented Aug 31, 2013 at 2:27

score 7 · Accepted Answer · 2013-08-31 15:23:24Z

This line definition is actually for defining a bitfield:

...
("more_funky_numbers_7bytes", c_uint, 56),
...

which is wrong here. The size of a bitfield should be less than or equals the size of the type, so c_uint should be at most 32, one extra bit will raise the exception:

ValueError: number of bits invalid for bit field

Example of using the bitfield:

from ctypes import *

class MyStructure(Structure):
    _fields_ = [
        # c_uint8 is 8 bits length
        ('a', c_uint8, 4), # first 4 bits of `a`
        ('b', c_uint8, 2), # next 2 bits of `a`
        ('c', c_uint8, 2), # next 2 bits of `a`
        ('d', c_uint8, 2), # since we are beyond the size of `a`
                           # new byte will be create and `d` will
                           # have the first two bits
    ]

mystruct = MyStructure()

mystruct.a = 0b0000
mystruct.b = 0b11
mystruct.c = 0b00
mystruct.d = 0b11

v = c_uint16()

# copy `mystruct` into `v`, I use Windows
cdll.msvcrt.memcpy(byref(v), byref(mystruct), sizeof(v))

print sizeof(mystruct) # 2 bytes, so 6 bits are left floating, you may
                       # want to memset with zeros
print bin(v.value)     # 0b1100110000

what you need is 7 bytes so what you endup doing is correct:

...
("more_funky_numbers_7bytes", c_byte * 7),
...

As for the size for the structure, It's going to be 52, I extra byte will be padded to align the structure on 4 bytes on 32 bit processor or 8 bytes on 64 bits. Here:

from ctypes import *

class BinaryHeader(BigEndianStructure):
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
        ("ascii_text_32bytes", c_char * 32),
        ("timestamp_4bytes", c_uint),
        ("more_funky_numbers_7bytes", c_byte * 7),
        ("some_flags_1byte", c_byte),
        ("other_flags_1byte", c_byte),
        ("payload_length_2bytes", c_ushort),
    ]

mystruct = BinaryHeader(
    0x11111111,
    '\x22' * 32,
    0x33333333,
    (c_byte * 7)(*([0x44] * 7)),
    0x55,
    0x66,
    0x7777
)

print sizeof(mystruct)

with open('data.txt', 'wb') as f:
    f.write(mystruct)

The extra byte is padded between other_flags_1byte and payload_length_2bytes in the file:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 00 77 77 f.ww
            ^
         extra byte

This is an issue when it comes to the file formats and network protocols. To change it pack it by 1:

 ...
class BinaryHeader(BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
...

the file will be:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 77 77    fww

As for struct, it won't make it easier in your case. Sadly it doesn't support nested tuples in the format. For example here:

>>> from struct import *
>>>
>>> data = '\x11\x11\x11\x11\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22
\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x33
\x33\x33\x33\x44\x44\x44\x44\x44\x44\x44\x55\x66\x77\x77'
>>>
>>> BinaryHeader = Struct('>I32cI7BBBH')
>>>
>>> BinaryHeader.unpack(data)
(286331153, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', 858993459, 68, 68, 68, 68, 68, 68, 68, 85, 102, 30583)
>>>

This result cannot be used namedtuple, you still have parse it based on the index. It would work if you can do something like '>I(32c)(I)(7B)(B)(B)H'. This feature has been requested here (Extend struct.unpack to produce nested tuples) since 2003 but nothing is done since.

Thanks for the detailed explanation! You also managed to answer my unanswered question of "how do I handle 2 fields of 4 bits each?". So the ctypes method is progressing well. I got stuck with struct trying to figure out how to handle the 4 bits case.
@monkut I believe struct doesn't support this, all what it suppors are the basic data types. You have to use bitwise operation and do it manually.
Thanks again, I implemented it and it was roughly a 60% improvement over my previous sloppy code. The only problem is when I went to try to read from tar files...apparently ExFileObject (file objects returned by tarinfo) don't support .readinto(b), doh!

Collectives™ on Stack Overflow

Parsing binary data into ctypes Structure object via readinto()

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related