First, is your program using too much memory? If the answer is "no" or "I'm not sure", then ignore this can carry on until you know you really do have a problem.
Using the same buffer for different arrays
You can do all of what you want using "views" that are available within numpy. Views are just different ways of looking at the same data. For instance,
import numpy as np
ints32 = np.array([0, 0, 0, 0], dtype="<i4") # dtype string means little endian 4 byte ints
assert len(ints32) == 4
ints16 = ints32.view(dtype="<i2")
assert len(ints16) == 8 # 32-bit ints need half as much space as a 32-bit int
ints32[0] = 0x11223344
assert ints16[0] == 0x3344
print(ints16) # prints [13124 4386 0 0 0 0 0 0]
# Thus, showing ints16 is backed by the same memory as ints32
You can also use an external buffer if you wish
buffer = bytearray(8)
floats32 = np.frombuffer(buffer, dtype="<f4")
floats32[0] = 1
print(buffer) # shows buffer has been modified
You need to be careful as you may end up with alignment errors:
buf = np.zeros(3, dtype=np.int8) # 3 byte buffer
arr = buf.view(dtype=np.int16) # Error! Needs a buffer with multiples of 2 bytes
two_byte_slice = buf[:2]
arr = two_byte_slice.view(dtype=np.int16) # Succeeds
arr[0] = 1
assert buf[0] == 1 # shows that two_byte_slice and arr are not copies of buf
Sharing the same buffer with different processes, or C libraries
Sharing buffers with C libraries or other processes carries certain risks. This risks are usually mitigated by only copying over the buffer immediately and only using that. However, managed carefully, you can still be safe.
For sharing a buffer with a C library, you must make sure:
- That the C library doesn't hold on to a pointer to the input buffer after the buffer has been released by Python. This is implicitly fine if the C library does not hold on to a reference to the buffer after a function returns, or if you keep a global reference to the owning object.
Sharing the data with another process is more complicated. But can also be made safe.
- Any spawned process copies data over from the buffer rather than directly using the buffer if it intends to outlive its parent.
- If two or more processes intend to share a buffer, but work synchronously, then they are well behaved in that a lock is assigned to guard access to buffer and processes observe this lock.
See the following example for sharing a buffer with another process, and using a lock to synchronise access (strictly speaking the lock isn't necessary as the parent waits for the child to complete before continuing).
import numpy as np
import ctypes
from multiprocessing import Array, Process
def main():
buf = Array(ctypes.c_int8, 10) # 10 byte buffer
with buf: # acquire lock
ctypes_arr = buf.get_obj()
arr = np.frombuffer(ctypes_arr, dtype=np.int16) # int16 array, with size 5
total = arr.sum()
del arr, ctypes_arr # losing lock, delete local reference to the buffer
print("total before:", total) # 0
p = Process(target=subprocess_target, args=(buf,))
p.start()
p.join()
with buf:
# interpret first 8 bytes as two 4 byte ints
view = memoryview(buf.get_obj())[:8]
arr = np.frombuffer(view, dtype=np.int32)
total = arr.sum()
del arr, view
print("total after:", total) # 262146
raw_bytes = list(buf.get_obj())
assert raw_bytes == [0, 0, 1, 0, 2, 0, 3, 0, 4, 0]
def subprocess_target(buf):
"""Sets elements in buf to [0, 1, ..., n-2, n-1]"""
with buf:
arr = np.frombuffer(buf.get_obj(), dtype=np.int16)
arr[:] = range(len(arr))
del arr
if __name__ == "__main__":
main()