Python readinto: How to convert from an array.array to a custom ctype structure

Question

I have created an array of integers and I would like them to be interpreted by the structure definition which I have created

from ctypes import *
from array import array

class MyStruct(Structure):
    _fields_ = [("init", c_uint),
                ("state", c_char),
                ("constant", c_int),
                ("address", c_uint),
                ("size", c_uint),
                ("sizeMax", c_uint),
                ("start", c_uint),
                ("end", c_uint),
                ("timestamp", c_uint),
                ("location", c_uint),
                ("nStrings", c_uint),
                ("nStringsMax", c_uint),
                ("maxWords", c_uint),
                ("sizeFree", c_uint),
                ("stringSizeMax", c_uint),
                ("stringSizeFree", c_uint),
                ("recordCount", c_uint),
                ("categories", c_uint),
                ("events", c_uint),
                ("wraps", c_uint),
                ("consumed", c_uint),
                ("resolution", c_uint),
                ("previousStamp", c_uint),
                ("maxTimeStamp", c_uint),
                ("threshold", c_uint),
                ("notification", c_uint),
                ("version", c_ubyte)]

# arr = array.array('I', [1])
# How can I do this?
# mystr = MyStruct(arr) magic
# (mystr.helloworld == 1) == True

I can do the following:

mystr = MyStruct()
rest = array.array('I')
with open('myfile.bin', 'rb') as binaryFile:
    binaryFile.readinto(mystr)
    rest.fromstring(binaryFile.read())

# Now create another struct with rest
rest.readinto(mystr) # Does not work

How can I avoid using a file to convert an array of Ints to a struct if the data is contained in an array.array('I')? I am not sure what the Structure constructor accepts or how the readinto works.

ShadowRanger · Accepted Answer · 2016-08-31 16:04:51Z

Solution #1: Star unpacking for one-line initialization

Star-unpacking will work, but only if all the fields in your structure are integer types. In Python 2.x, c_char cannot be initialized from an int (it works fine in 3.5). If you change the type of state to c_byte, then you can just do:

mystr = MyStruct(*myarr)

This doesn't actually benefit from any array specific magic (the values are briefly converted to Python ints in the unpacking step, so you're not reducing peak memory usage), so you'd only bother with an array if initializing said array was easier than directly reading into the structure for whatever reason.

If you go the star unpacking route, reading .state will now get you int values instead of len 1 str values. If you want to initialize with int, but read as one character str, you can use a protected name wrapped in a property:

class MyStruct(Structure):
    _fields_ = [...
                ("_state", c_byte),  # "Protected" name int-like; constructor expects int
                ...]

    @property
    def state(self):
        return chr(self._state)

    @state.setter
    def state(self, x):
        if isinstance(x, basestring):
            x = ord(x)
        self._state = x

A similar technique could be used without propertys by defining your own __init__ that converted the state argument passed:

class MyStruct(Structure):
    _fields_ = [("init", c_uint),
                ("state", c_char),
                ...]

    def __init__(self, init=0, state=b'\0', *args, **kwargs):
        if not isinstance(state, basestring):
            state = chr(state)
        super(MyStruct, self).__init__(init, state, *args, **kwargs)

Solution #2: Direct `memcpy`-like solutions to reduce temporaries

You can use some array specific magic to avoid the temporary Python level ints though (and avoid the need to change state to c_byte) without real file objects using a fake (in-memory) file-like object:

import io

mystr = MyStruct()  # Default initialize

# Use BytesIO to gain the ability to write the raw bytes to the struct
# because BytesIO's readinto isn't finicky about exact buffer formats
io.BytesIO(myarr.tostring()).readinto(mystr)

# In Python 3, where array implements the buffer protocol, you can simplify to:
io.BytesIO(myarr).readinto(mystr)
# This still performs two memcpys (one occurs internally in BytesIO), but
# it's faster by avoiding a Python level method call

This only works because your non-c_int width attributes are followed by c_int width attributes (so they're padded out to four bytes anyway); if you had two c_ubyte/c_char/etc. types back to back, then you'd have problems (because one value of the array would initialize two fields in the struct, which does not appear to be what you want).

If you were using Python 3, you could benefit from array specific magic to avoid the cost of both unpacking and the two step memcpy of the BytesIO technique (from array -> bytes -> struct). It works in Py3 because Py3's array type supports the buffer protocol (it didn't in Py2), and because Py3's memoryview features a cast method that lets you change the format of the memoryview to make it directly compatible with array:

mystr = MyStruct()  # Default initialize

# Make a view on mystr's underlying memory that behaves like a C array of
# unsigned ints in native format (matching array's type code)
# then perform a "memcpy" like operation using empty slice assignment
# to avoid creating any Python level values.
memoryview(mystr).cast('B').cast('I')[:] = myarr

Like the BytesIO solution, this only works because your fields all happen to pad to four bytes in size

Performance

Performance-wise, star unpacking wins for small numbers of fields, but for large numbers of fields (your case has a couple dozen), direct memcpy based approaches win out; in tests for a 23 field class, the BytesIO solution won over star unpacking on my Python 2.7 install by a factor of 2.5x (star unpacking was 2.5 microseconds, BytesIO was 1 microsecond).

The memoryview solution scales similarly to the BytesIO solution, though as of 3.5, it's slightly slower than the BytesIO approach (likely a result of the need to construct several temporary memoryviews to perform the necessary casting operations and/or the memoryview slice assignment code being general purpose for many possible formats, so it's not simple memcpy in implementation). memoryview might scale better for much larger copies (if the losses are due to the fixed cast overhead), but it's rare that you'd have a struct large enough to matter; it would only be in more general purpose copying scenarios (to and from ctypes arrays or the like) that memoryview would potentially win.

Side-note: Normally, for file-like objects, I'd use them with a with statement; in this specific case of BytesIO I omitted it because no real file handles are involved (so even on non-CPython interpreters, it's not a problem if the memory is cleaned up lazily), and using the with statement involves significant overhead; on Py 3.5, changing io.BytesIO(myarr).readinto(mystr) to with io.BytesIO(myarr) as bio: bio.readinto(mystr) roughly doubles the time per initialization in my test case of a 23 field class (314 ns -> 541 ns; trivial cost for real I/O, but not for in-memory fake I/O).

YFP · Accepted Answer · 2016-08-26 08:45:02Z

0

Does this have to be an array? could you use a list maybe? you can unpack from a list in to a function you can use the * operator:

mystr = MyStruct(*arr)

or a dict with:

mystr = MyStruct(**arr)

answered Aug 26, 2016 at 8:45

YFP

3473 silver badges8 bronze badges

6 Comments

Har Over a year ago

I could use array.tolist() ill try to do that, nope doesnt work I get: TypeError: one character string expected

YFP Over a year ago

Is there a reason to be using array in your code? my presented solution meant reengineer the code to use a native data structure like a list or dictionary

Har Over a year ago

well... an array is the same as a list if I did a .tolist() what you are describing here is unpacking a list or a dictionary for a generic function which is not what I am trying to do. I need an array since I am representing binary data and I am trying to convert it to a ctype struct.

Har Over a year ago

I think the issue is that the data is coming from a file and I am not sure how this is relates to what you have described vs the readinto call

ShadowRanger Over a year ago

@Har: That error you get doesn't make sense given the definitions from your example code. Put something closer to your real code in the question.

|

Collectives™ on Stack Overflow

Python readinto: How to convert from an array.array to a custom ctype structure

2 Answers 2

Solution #1: Star unpacking for one-line initialization

Solution #2: Direct `memcpy`-like solutions to reduce temporaries

Performance

1 Comment

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Solution #1: Star unpacking for one-line initialization

Solution #2: Direct memcpy-like solutions to reduce temporaries

Performance

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Solution #2: Direct `memcpy`-like solutions to reduce temporaries