Data Encoding in C and Python

Question

I'm encountering the following question. I've written some double-type data into binary files using C and now I want to read them using Python. When I used python function

with open("test.dat","rb") as dfile:
    data = dfile.read()

It gave me

b'\x00\x00\x00\x00\x00\x00\xf8?\x00\x00\x00\x00\x00\x00\x04@\x00\x00\x00\x00\x00\x00\n@\x00\x00\x00\x00\x00\x00\x11@'

So I tried to decode using data.decode(), then it gave me decoding error. I suppose it was because I used the wrong encoding type. But I tried ascii and utf-8 and they did not work. Therefore my questions is 2-fold:

How can i read an binary file without knowing the encoding type?
Since i did not give an encoding type when writing the binary file in c, does c encode the data at all? If yes, what kind of encoding type would that be?

FYI, the code i used to write binary file in the first place is

#include <stdio.h>

int main(){
  double buffer[4]= {1.5, 2.5, 3.25, 4.25};
  FILE *ptr;

  ptr = fopen("test.dat", "wb");
  fwrite(buffer,sizeof(buffer),1,ptr);
  printf("%ld\n",sizeof(buffer));

  return 0;
}

"decoding" in this sense is to convert binary data into text. However, the data you saved in your C program represents double-precision floating point numbers, not text. — Code-Apprentice
– Code-Apprentice, Commented Sep 3, 2018 at 19:19
You need to convert the C types into Python types. Use struct.unpack in the standard library: docs.python.org/3/library/struct.html — cdarke
– cdarke, Commented Sep 3, 2018 at 19:20
Aside, in C the buffer size printed should be with printf("%zu\n",sizeof(buffer)); — Weather Vane
– Weather Vane, Commented Sep 3, 2018 at 19:24

cdarke · Accepted Answer · 2018-09-03 19:27:50Z

3

You need to convert the C types into Python types. Use struct.unpack in the standard library here.

The format string, in this case, is dddd, meaning 4 doubles. The difficulty comes when moving C types between different compilers and machines.

import struct

with open('test.dat', 'rb') as dfile:
    data = dfile.read()

result = struct.unpack("dddd", data)
print(result)

Gives a tuple:

(1.5, 2.5, 3.25, 4.25)

answered Sep 3, 2018 at 19:27

cdarke

44.8k8 gold badges85 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cdarke Over a year ago

This assumes the same endianness (which tends to be a feature of the hardware). If you look at the documentation you will find that endianness can be specified, docs.python.org/3/library/struct.html#struct-format-strings in 7.1.2.1. Byte Order, Size, and Alignment

cdarke Over a year ago

That's why I said in my post The difficulty comes when moving C types between different compilers and machines. As you know, for example, sizeof(int) is not specified in the language.

Lev Zakharov · Accepted Answer · 2018-09-03 19:37:45Z

1

You can use python standard array module:

from array import array

u = array('d')

with open('test.dat', 'rb') as f:
    data = f.read()
    u.frombytes(data)
    print(u)
    print(u.tolist())

Output:

array('d', [1.5, 2.5, 3.25, 4.25])
[1.5, 2.5, 3.25, 4.25]

edited Sep 3, 2018 at 19:37

answered Sep 3, 2018 at 19:32

Lev Zakharov

2,4371 gold badge13 silver badges25 bronze badges

Comments

Mad Physicist · Accepted Answer · 2018-09-03 19:35:35Z

0

If you are open to using numpy, use np.fromfile:

with open("test.dat","rb") as dfile:
    data = np.fromfile(dfile)

You may find numpy arrays easier to manipulate than plain python types because of the huge ecosystem of code that's grown up around them.

answered Sep 3, 2018 at 19:35

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Collectives™ on Stack Overflow

Data Encoding in C and Python

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related