4

Are there any nice Python solutions like Ruby's BinData for reading user-defined binary file/stream formats? If not, then what's the preferred way to this in Python outside of just using the struct module?

I have a binary file that stores "records" of events. The records are dynamic in size, so I must read the first few bytes of each record to determine the record length and record type. Different record types will have different byte layouts. For instance, a record of type "warning" might contain three 4-byte ints, followed by a 128 byte value, while a record of type "info" might just have five 4-byte ints.

It would be nice to define the different record types and their structures in such a way that I could simply pass a binary blob to something, and it handle the rest (object generation, etc). In short, your defining templates/maps on how to interpret binary data.

5
  • 2
    Ever looked at the struct module? Commented May 25, 2011 at 23:16
  • Yes, but at first glance I'm not aware of a way to specify custom structures like BinData. Commented May 25, 2011 at 23:23
  • 1
    What do you need to do that the struct module cannot do? Commented May 25, 2011 at 23:36
  • What do you mean "custom structures"? You need to be more specific than "like [Ruby's] binData". You're unnecessarily limiting the number of people who might be able to help you by providing very vague requirements. The set of people able to answer your question is those who have a lot of experience reading binary data in both Ruby and Python. That's a very small population. Even the set of developers with good Ruby AND Python experience is fairly small, nevermind dealing with raw binary data (an increasingly rare thing in today's world). Commented May 26, 2011 at 1:53
  • Thanks for that Nicholas. Reading it now, I realize I posted this question in haste and didn't take the time to explain what I'm actually trying to do. I've updated my question above. Commented May 26, 2011 at 3:29

4 Answers 4

4

Maybe you are looking for Construct, a pure-Python 2 & 3 binary parsing library?

Sign up to request clarification or add additional context in comments.

Comments

3

Python's struct module works like this:

record_header = struct.Struct("<cb") 
warning = struct.Struct("<iii128")
info = struct.Struct("<iiiii")

while True:
    header_text = input.read(record_header.size)
    # file is empty
    if not header_text:
       break
    packet_type, extra_data = record_header.unpack(header_text)
    if packet_type == 'w':
        warning_data = warning.unpack( input.read(warning.size) )
    elif packet_type == 'i':
        info_data = info.unpack( input.read(info.size) )

See the documentation for details: http://docs.python.org/library/struct.html

Comments

2

The struct module would probably work, but you might also use the python bindings for Google's protocol buffers.

Comments

-1

I would like to give an example for how to do reading in python.

typedef struct {
    ID             chunkname;
    long           chunksize;

    /* Note: there may be additional fields here, depending upon your data. */

} Chunk;

How you read this struct data from file in python? Here is one way:

class Chunk:
def __init__(self, file, align=True, bigendian=True, inclheader=False):
    import struct
    self.closed = False
    self.align = align      # whether to align to word (2-byte) boundaries
    if bigendian:
        strflag = '>'
    else:
        strflag = '<'
    self.file = file
    self.chunkname = file.read(4)
    if len(self.chunkname) < 4:
        # you need to take care of end of file
        raise EOFError
    try:
        # you could use unpack
        # http://docs.python.org/2/library/struct.html#format-characters
        # here 'L' means 'unsigned long' 4 standard size
        self.chunksize = struct.unpack(strflag+'L', file.read(4))[0]
    except struct.error:
        # you need to take care of end of file
        raise EOFError
    if inclheader:
        self.chunksize = self.chunksize - 8 # subtract header
    self.size_read = 0
    try:
        self.offset = self.file.tell()
    except (AttributeError, IOError):
        self.seekable = False
    else:
        self.seekable = True

So you need to understand the mapping between c structure and the format for struct.unpack() http://docs.python.org/2/library/struct.html#format-characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.