Define Ctypes array that overlaps in memory for numpy array and multiprocessing

Question

How to define a ctype array buffer that can hold several numpy array of floats (say A, B, C) at one time point and then hold several numpy arrays of integers (say D, E) at another time point? Can this be done with some combination of ctypes, numpy, or multiprocessing in python?

Thank you. I am trying to use less memory.

Dunes · Accepted Answer · 2016-06-03 20:05:05Z

1

First, is your program using too much memory? If the answer is "no" or "I'm not sure", then ignore this can carry on until you know you really do have a problem.

Using the same buffer for different arrays

You can do all of what you want using "views" that are available within numpy. Views are just different ways of looking at the same data. For instance,

import numpy as np

ints32 = np.array([0, 0, 0, 0], dtype="<i4") # dtype string means little endian 4 byte ints
assert len(ints32) == 4
ints16 = ints32.view(dtype="<i2")
assert len(ints16) == 8 # 32-bit ints need half as much space as a 32-bit int
ints32[0] = 0x11223344
assert ints16[0] == 0x3344
print(ints16) # prints [13124 4386 0 0 0 0 0 0]
# Thus, showing ints16 is backed by the same memory as ints32

You can also use an external buffer if you wish

buffer = bytearray(8)
floats32 = np.frombuffer(buffer, dtype="<f4")
floats32[0] = 1
print(buffer) # shows buffer has been modified

You need to be careful as you may end up with alignment errors:

buf = np.zeros(3, dtype=np.int8) # 3 byte buffer
arr = buf.view(dtype=np.int16) # Error! Needs a buffer with multiples of 2 bytes
two_byte_slice = buf[:2]
arr = two_byte_slice.view(dtype=np.int16) # Succeeds
arr[0] = 1
assert buf[0] == 1 # shows that two_byte_slice and arr are not copies of buf

Sharing the same buffer with different processes, or C libraries

Sharing buffers with C libraries or other processes carries certain risks. This risks are usually mitigated by only copying over the buffer immediately and only using that. However, managed carefully, you can still be safe. For sharing a buffer with a C library, you must make sure:

That the C library doesn't hold on to a pointer to the input buffer after the buffer has been released by Python. This is implicitly fine if the C library does not hold on to a reference to the buffer after a function returns, or if you keep a global reference to the owning object.

Sharing the data with another process is more complicated. But can also be made safe.

Any spawned process copies data over from the buffer rather than directly using the buffer if it intends to outlive its parent.
If two or more processes intend to share a buffer, but work synchronously, then they are well behaved in that a lock is assigned to guard access to buffer and processes observe this lock.

See the following example for sharing a buffer with another process, and using a lock to synchronise access (strictly speaking the lock isn't necessary as the parent waits for the child to complete before continuing).

import numpy as np
import ctypes
from multiprocessing import Array, Process


def main():
    buf = Array(ctypes.c_int8, 10) # 10 byte buffer

    with buf: # acquire lock
        ctypes_arr = buf.get_obj()
        arr = np.frombuffer(ctypes_arr, dtype=np.int16) # int16 array, with size 5
        total = arr.sum()
        del arr, ctypes_arr # losing lock, delete local reference to the buffer

    print("total before:", total) # 0

    p = Process(target=subprocess_target, args=(buf,))
    p.start()
    p.join()

    with buf:
        # interpret first 8 bytes as two 4 byte ints
        view = memoryview(buf.get_obj())[:8]
        arr = np.frombuffer(view, dtype=np.int32)
        total = arr.sum()
        del arr, view

    print("total after:", total) # 262146
    raw_bytes = list(buf.get_obj())
    assert raw_bytes == [0, 0, 1, 0, 2, 0, 3, 0, 4, 0]


def subprocess_target(buf):
    """Sets elements in buf to [0, 1, ..., n-2, n-1]"""
    with buf:
        arr = np.frombuffer(buf.get_obj(), dtype=np.int16)
        arr[:] = range(len(arr))
        del arr


if __name__ == "__main__":
    main()

edited Jun 3, 2016 at 20:05

answered Jun 3, 2016 at 15:19

Dunes

42.1k7 gold badges86 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

rxu Over a year ago

Thank you very much. I really am trying to stuff all the data into memory with sparse matrices, shared memory between sub-processes, the out option of dot method, directly calling c functions from intel mkl library, etc. Would getting ctype pointer to all these views cause problem?

Dunes Over a year ago

I've made an edit to show how you can safely share memory with another process and use the memory as you need it.

rxu Over a year ago

numpy and memoryview in python2 can't mix. github.com/numpy/numpy/issues/5935

rxu Over a year ago

instead of view, I use frombuffer to create a numpy array for the whole buffer. Then I use simple indexing to access part of the numpy array. That seems to work, so far.

Collectives™ on Stack Overflow

Define Ctypes array that overlaps in memory for numpy array and multiprocessing

1 Answer 1

Using the same buffer for different arrays

Sharing the same buffer with different processes, or C libraries

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Using the same buffer for different arrays

Sharing the same buffer with different processes, or C libraries

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related