How to read in binary data after ascii header in Python

Question

I have some imaging data that's stored in a file that contains an ascii text header, ending with a null character, followed by the binary data. The ascii headers vary in length, and I'm wondering what's the best way to open the file, read the header and find the null character, and then load the binary data (in Python).

Thanks for the help,
James

Have you looked at the docs.python.org/library/struct.html module yet? — S.Lott
– S.Lott, Commented Feb 5, 2011 at 0:14
In addition to the struct module, if you have large blocks of homogenous (i.e. same type, 32-bit floats, 16-bit uints, etc) data have a look at the array module: docs.python.org/library/array.html Alternately, if by chance you're going to be using numpy, numpy.fromfile is very useful for this sort of thing. — Joe Kington
– Joe Kington, Commented Feb 5, 2011 at 0:32

S.Lott · Accepted Answer · 2011-02-05 01:10:01Z

2

Probably ought to start with something like this.

with open('some file','rb') as input:
    aByte= input.read(1)
    while aByte and ord(aByte) != 0: aByte= input.read(1)
    # At this point, what's left is the binary data.

Python version numbers matter a lot for this kind of thing. The issue is the result of the read function. Some versions can return bytes (which are numbers). Other versions will return strings (which requires ord(aByte)).

edited Feb 5, 2011 at 1:10

answered Feb 5, 2011 at 0:58

S.Lott

393k83 gold badges520 silver badges791 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Joe Kington Over a year ago

This will result in an infinite loop in files that only contain a non-null terminated string! (i.e. if the string's not null-terminated, reading beyond the end of the file will return an empty string).

S.Lott Over a year ago

@Joe Kington: "an ascii text header, ending with a null character" While your comment is true, it's at odds with the stated file format.

Joe Kington Over a year ago

True! I just wanted to point that out... Corrupted and/or truncated files have bitten me in the past when doing similar things. Either way, your answer is the simplest way read the specified format.

James Over a year ago

Thanks a lot; using while aByte and ord(aByte) != 0: aByte= input.read(1) and then numpy.fromfile did the trick.

Gerrat · Accepted Answer · 2011-02-05 00:25:51Z

1

Does something like this work:

with open('some_file','rb') as f:
  binary_data = f.read().split('\0',1)[1]

answered Feb 5, 2011 at 0:25

Gerrat

29.9k9 gold badges78 silver badges104 bronze badges

Comments

Joe Kington · Accepted Answer · 2011-02-05 01:36:23Z

Other people have already answered your direction question, but I thought I'd add this.

When working with binary data, I often find it useful to subclass file and add various convince methods for reading/writing packed binary data.

It's overkill for simple things, but if you find yourself parsing lots of binary file formats, it's worth the extra effort to avoid repeating yourself.

If nothing else, hopefully it serves as a useful example of how to use struct. On a side note, this is pulled from older code, and is very much python 2.x. Python 3.x handles this (particularly strings vs. bytes) significantly differently.

import struct
import array

class BinaryFile(file):
    """
    Automatically packs or unpacks binary data according to a format
    when reading or writing.
    """
    def __init__(self, *args, **kwargs):
        """
        Initialization is the same as a normal file object
        %s""" % file.__doc__
        super(BinaryFile, self).__init__(self, *args, **kwargs)

    def read_binary(self,fmt):
        """
        Read and unpack a binary value from the file based
        on string fmt (see the struct module for details).
        This will strip any trailing null characters if a string format is
        specified. 
        """
        size = struct.calcsize(fmt)
        data = self.read(size)
        # Reading beyond the end of the file just returns ''
        if len(data) != size:
            raise EOFError('End of file reached')
        data = struct.unpack(fmt, data)

        for item in data:
            # Strip trailing zeros in strings 
            if isinstance(item, str):
                item = item.strip('\x00')

        # Unpack the tuple if it only has one value
        if len(data) == 1: 
            data = data[0]

        return data

    def write_binary(self, fmt, dat):
        """Pack and write data to the file according to string fmt."""
        # Try expanding input arguments (struct.pack won't take a tuple)
        try: 
            dat = struct.pack(fmt, *dat) 
        except (TypeError, struct.error): 
            # If it's not a sequence (TypeError), or if it's a 
            # string (struct.error), don't expand.
            dat = struct.pack(fmt, dat) 
        self.write(dat)

    def read_header(self, header):
        """
        Reads a defined structure "header" consisting of a sequence of (name,
        format) strings from the file. Returns a dict with keys of the given
        names and values unpaced according to the given format for each item in
        "header".
        """
        header_values = {}
        for key, format in header:
            header_values[key] = self.read_binary(format)
        return header_values

    def read_nullstring(self):
        """
        Reads a null-terminated string from the file. This is not implemented
        in an efficient manner for long strings!
        """
        output_string = ''
        char = self.read(1)
        while char != '\x00':
            output_string += char
            char = self.read(1)
            if len(char) == 0:
                break
        return output_string

    def read_array(self, type, number):
        """
        Read data from the file and return an array.array of the given
        "type" with "number" elements
        """
        size = struct.calcsize(type)
        data = self.read(size * number)
        return array.array(type, data)

Collectives™ on Stack Overflow

How to read in binary data after ascii header in Python

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related