1

I've some performance trouble to put data from a byte array to the internal data structure. The data contains several nested arrays and can be extracted as the attached code. In C it takes something like one Second by reading from a stream, but in Python it takes almost one Minute. I guess indexing and calling int.from_bytes was not the best idea. Has anybody a proposal to improve the performance?

...
ycnt = int.from_bytes(bytedat[idx:idx + 4], 'little')
idx += 4
while ycnt > 0:
    ky = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    dv = DataObject()
    xvec.update({ky: dv})
    dv.x = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    dv.y = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    cntv = int.from_bytes(bytedat[idx:idx + 4], 'little')
    idx += 4
    while cntv > 0:
        dv.data_values.append(int.from_bytes(bytedat[idx:idx + 4], 'little', signed=True))
        idx += 4
        cntv -= 1
    dv.score = struct.unpack('d', bytedat[idx:idx + 8])[0]
    idx += 8
    ycnt -= 1
...
1
  • 2
    what about using struct.unpack ? Commented Feb 2, 2022 at 12:20

2 Answers 2

3

First, a factor 60 between Python versus C is normal for low-level code like this. This is not where Python shines, because it doesn't get compiled down to machine-code.

Micro-Optimizations

The most obvious one is to reduce your integer math by using struct.unpack() properly. See the format string docu. Something like this:

ky, dy, dv.x, dv.y, cntv = struct.unpack('<iiiii', bytedat[idx:idx+5*4])

The second one is to load your int arrays (if they are large) "in batch" instead of the (interpreted!) while cntv > 0 loop. I would use a numpy array:

numpy.frombuffer(bytedat[idx:idx + 4*cntv], dtype='int32')

Why is not a list? A Python list contains (generic) Python objects. It requires extra memory and pointer indirection for each item. Libraries cannot use optimized C code (for example to calculate the sum) because each item has first to be dereferenced and then checked for its type.

A numpy object, on the other hand, is basically a wrapper to manage the memory of a C array. Loading it it will probably boil down to a memcpy(), or it may even just reference the bytes memory you passed.

And thirdly, instead of xvec.update({ky: dv}) you can probably write xvec[ky] = dy. This may prevent the creation of a temporary dict object.

Compiling your Python-Code

There are ways to compile Python (partially) down to machine code (PyPy, Numba, Cython). It's a bit involved, but your original byte-indexing code would then run at C speed.

However, you are filling a Python list and a dict in the inner loop. This is never going to get "C"-like fast because it will have to deal with Python objects and reference counting, even when it gets compiled down to C.

Different file format

The easiest way is to use a data format handled by a fast specialized library (like numpy, hd5, pillow, maybe even pandas).

The pickle module may also help, but only if you can control the writing and everything is trusted, and you mainly care about loading speed.

Sign up to request clarification or add additional context in comments.

Comments

0

I do something similar, but big-endian.

I find that

(byte1 << 8) | byte2

to be faster than int.from_bytes() and struct.unpack().

I also find pypy3 to be at least 4x faster than python3 for this sort of stuff.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.