1

I'm encountering the following question. I've written some double-type data into binary files using C and now I want to read them using Python. When I used python function

with open("test.dat","rb") as dfile:
    data = dfile.read()

It gave me

b'\x00\x00\x00\x00\x00\x00\xf8?\x00\x00\x00\x00\x00\x00\x04@\x00\x00\x00\x00\x00\x00\n@\x00\x00\x00\x00\x00\x00\x11@'

So I tried to decode using data.decode(), then it gave me decoding error. I suppose it was because I used the wrong encoding type. But I tried ascii and utf-8 and they did not work. Therefore my questions is 2-fold:

  1. How can i read an binary file without knowing the encoding type?

  2. Since i did not give an encoding type when writing the binary file in c, does c encode the data at all? If yes, what kind of encoding type would that be?

FYI, the code i used to write binary file in the first place is

#include <stdio.h>

int main(){
  double buffer[4]= {1.5, 2.5, 3.25, 4.25};
  FILE *ptr;

  ptr = fopen("test.dat", "wb");
  fwrite(buffer,sizeof(buffer),1,ptr);
  printf("%ld\n",sizeof(buffer));

  return 0;
}
3
  • 2
    "decoding" in this sense is to convert binary data into text. However, the data you saved in your C program represents double-precision floating point numbers, not text. Commented Sep 3, 2018 at 19:19
  • 5
    You need to convert the C types into Python types. Use struct.unpack in the standard library: docs.python.org/3/library/struct.html Commented Sep 3, 2018 at 19:20
  • Aside, in C the buffer size printed should be with printf("%zu\n",sizeof(buffer)); Commented Sep 3, 2018 at 19:24

3 Answers 3

3

You need to convert the C types into Python types. Use struct.unpack in the standard library here.

The format string, in this case, is dddd, meaning 4 doubles. The difficulty comes when moving C types between different compilers and machines.

import struct

with open('test.dat', 'rb') as dfile:
    data = dfile.read()

result = struct.unpack("dddd", data)
print(result)

Gives a tuple:

(1.5, 2.5, 3.25, 4.25)
Sign up to request clarification or add additional context in comments.

2 Comments

This assumes the same endianness (which tends to be a feature of the hardware). If you look at the documentation you will find that endianness can be specified, docs.python.org/3/library/struct.html#struct-format-strings in 7.1.2.1. Byte Order, Size, and Alignment
That's why I said in my post The difficulty comes when moving C types between different compilers and machines. As you know, for example, sizeof(int) is not specified in the language.
1

You can use python standard array module:

from array import array

u = array('d')

with open('test.dat', 'rb') as f:
    data = f.read()
    u.frombytes(data)
    print(u)
    print(u.tolist())

Output:

array('d', [1.5, 2.5, 3.25, 4.25])
[1.5, 2.5, 3.25, 4.25]

Comments

0

If you are open to using numpy, use np.fromfile:

with open("test.dat","rb") as dfile:
    data = np.fromfile(dfile)

You may find numpy arrays easier to manipulate than plain python types because of the huge ecosystem of code that's grown up around them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.