Low Performance unpacking byte array to data structure

Question

I've some performance trouble to put data from a byte array to the internal data structure. The data contains several nested arrays and can be extracted as the attached code. In C it takes something like one Second by reading from a stream, but in Python it takes almost one Minute. I guess indexing and calling int.from_bytes was not the best idea. Has anybody a proposal to improve the performance?

...
ycnt = int.from_bytes(bytedat[idx:idx + 4], 'little')
idx += 4
while ycnt > 0:
    ky = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    dv = DataObject()
    xvec.update({ky: dv})
    dv.x = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    dv.y = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    cntv = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    while cntv > 0:
        dv.data_values.append(int.from_bytes(bytedat[idx:idx + 4], 'little', signed=True))
        idx += 4
        cntv -= 1
    dv.score = struct.unpack('d', bytedat[idx:idx + 8])[0]
    idx += 8
    ycnt -= 1
...

what about using struct.unpack ?

Jérôme Richard
– Jérôme Richard

2022-02-02 12:20:59 +00:00
Commented Feb 2, 2022 at 12:20 — Jérôme Richard
– Jérôme Richard, Commented Feb 2, 2022 at 12:20

maxy · Accepted Answer · 2022-02-02 16:04:03Z

First, a factor 60 between Python versus C is normal for low-level code like this. This is not where Python shines, because it doesn't get compiled down to machine-code.

Micro-Optimizations

The most obvious one is to reduce your integer math by using struct.unpack() properly. See the format string docu. Something like this:

ky, dy, dv.x, dv.y, cntv = struct.unpack('<iiiii', bytedat[idx:idx+5*4])

The second one is to load your int arrays (if they are large) "in batch" instead of the (interpreted!) while cntv > 0 loop. I would use a numpy array:

numpy.frombuffer(bytedat[idx:idx + 4*cntv], dtype='int32')

Why is not a list? A Python list contains (generic) Python objects. It requires extra memory and pointer indirection for each item. Libraries cannot use optimized C code (for example to calculate the sum) because each item has first to be dereferenced and then checked for its type.

A numpy object, on the other hand, is basically a wrapper to manage the memory of a C array. Loading it it will probably boil down to a memcpy(), or it may even just reference the bytes memory you passed.

And thirdly, instead of xvec.update({ky: dv}) you can probably write xvec[ky] = dy. This may prevent the creation of a temporary dict object.

Compiling your Python-Code

There are ways to compile Python (partially) down to machine code (PyPy, Numba, Cython). It's a bit involved, but your original byte-indexing code would then run at C speed.

However, you are filling a Python list and a dict in the inner loop. This is never going to get "C"-like fast because it will have to deal with Python objects and reference counting, even when it gets compiled down to C.

Different file format

The easiest way is to use a data format handled by a fast specialized library (like numpy, hd5, pillow, maybe even pandas).

The pickle module may also help, but only if you can control the writing and everything is trusted, and you mainly care about loading speed.

user9710374 · Accepted Answer · 2022-02-16 17:38:33Z

0

I do something similar, but big-endian.

I find that

(byte1 << 8) | byte2

to be faster than int.from_bytes() and struct.unpack().

I also find pypy3 to be at least 4x faster than python3 for this sort of stuff.

answered Feb 16, 2022 at 17:38

user9710374

Collectives™ on Stack Overflow

Low Performance unpacking byte array to data structure

2 Answers 2

Micro-Optimizations

Compiling your Python-Code

Different file format

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Micro-Optimizations

Compiling your Python-Code

Different file format

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related