3

I'm trying to read a binary file (which represents a matrix in Matlab) in Python. But I am having trouble reading the file and converting the bytes to the correct values.

The binary file consists of a sequence of 4-byte numbers. The first two numbers are the number of rows and columns respectively. My friend gave me a Matlab function he wrote that does this using fwrite. I would like to do something like this:

f = open(filename, 'rb')
rows = f.read(4)
cols = f.read(4)
m = [[0 for c in cols] for r in rows]
r = c = 0
while True:
    if c == cols:
        r += 1
        c = 0
    num = f.read(4)
    if num:
        m[r][c] = num
        c += 1
    else:
        break

But whenever I use f.read(4), I get something like '\x00\x00\x00\x04' (this specific example should represent a 4), and I can't figure out convert it into the correct number (using int, hex or anything like that doesn't work). I stumbled upon struct.unpack, but that didn't seem to help very much.

Here is an example matrix and the corresponding binary file (as it appears when I read the entire file using the python function f.read() without any size paramater) that the Matlab function created for it:

4     4     2     4
2     2     2     1
3     3     2     4
2     2     6     2

'\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00'

So the first 4 bytes and the 5th-8th bytes should both be 4, as the matrix is 4x4. and then it should be 4,4,2,4,2,2,2,1,etc...

Thanks guys!

1
  • The struct module is your friend. It might take you a little bit to get used to, but it is a very powerful tool. Commented Jul 1, 2010 at 22:46

2 Answers 2

7
rows = f.read(4)
cols = f.read(4)

both names are now bound to 4-byte strings. To turn them into integers instead,

import struct

rowsandcols = f.read(8)
rows, cols = struct.unpack('=ii', rowsandcols)

See the docs for struct.unpack.

Sign up to request clarification or add additional context in comments.

5 Comments

It didn't work for me =/ >>> import struct >>> f = open('Z:\summer reu 2010\m.dat','rb') >>> rowsandcols = f.read(8) >>> rows, cols = struct.unpack('=ii',rowsandcols) >>> rows 67108864 >>> cols 67108864 rows and cols should both be 4
gahh i can't format my comment. Here is a screenshot: i47.tinypic.com/14ub18n.jpg
Considering the data is described as being big-endian and that most popular CPUs today are little-endian, perhaps it should be ! or > instead of = ?
yes that worked Nas. Can someone please explain what all these different formats actually mean? What is big-endian/small-endian and native/standard?
I looked "endianness" up on wikipedia, sorry to bother you all. Thank you very much for the help! =)
2

I looked a bit more in your problem, since I had never used struct before so it was good learning activity. Turns out there are couple of twists there - first the numbers are not stored as 4-byte integers but as 4-byte float in big-endian form. Second, if your example is correct, then the matrix was not stored as one would expect - by rows, but by columns instead. E.g. it was output like so (pseudocode):

for j in cols:
  for i in rows:
    write Aij to file

So I had to transpose the result after reading. Here is the code that you need given the example:

import struct 

def readMatrix(f):
    rows, cols = struct.unpack('>ii',f.read(8))
    m = [ list(struct.unpack('>%df' % rows, f.read(4*rows)))
             for c in range(cols)
        ]
    # transpose result to return
    return zip(*m)

And here we test it:

>>> from StringIO import StringIO
>>> f = StringIO('\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00')
>>> mat = readMatrix(f)
>>> for row in mat:
...     print row
...     
(4.0, 4.0, 2.0, 4.0)
(2.0, 2.0, 2.0, 1.0)
(3.0, 3.0, 2.0, 4.0)
(2.0, 2.0, 6.0, 2.0)

2 Comments

Your answer was better, my apologies. However, I don't know if it was just my machine, but I had to use "!" instead of ">" for struct.unpack
@Daniel: hm, that's weird if '!' and '>' give you different result, seems to me they should be the same. The documentation says The form "!" [network order = big-endian] is available for those poor souls who claim they can't remember whether network byte order is big-endian [">"] or little-endian ["<"]. But if it works, don't touch it - it ain't broken :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.