I ask your help to solve the following problem. I wish have 3 numpy arrays build as the scheme :
+---+
| a |
arr1 : +---+ +---+
| b | <- | b |
+---+ | |
+---+ : arr
+---+ | |
| c | <- | c |
arr2 : +---+ +---+
| d |
+---+
So arr shared its values with other array.
arr1 and arr2 are classic numpy array defined by example :
arr1 = np.array([1,2]); arr2 = np.array([3,4])
Then, by which mean can I build arr such as to have the behavior
arr1[1] = 7 ### will give arr1 = [1,7] arr2 = [3,4] arr = [7,3]
arr[1] = 13 ### will give arr1 = [1,7] arr2 = [13,4] arr = [7,13]
?
My test
After several hours on web, ctypes features seems to be the solution to mimic C-pointers behavior in Python. I have try this :
import numpy as np
import ctypes
arr1 = np.array([1.0,0.2]).astype(np.float32)
arr2 = np.array([3.0,4.0]).astype(np.float32)
arr1_ptr = arr1.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
arr1bis = np.ctypeslib.as_array((ctypes.c_float * arr1.size).from_address(ctypes.addressof(arr1_ptr.contents)))
print(arr1bis) #### --> [ 1. 0.2]
arr1[0] = 7.0
print(arr1bis) #### --> [ 7. 0.2]
arr1bis[1] = 13.0
print(arr1) #### --> [ 7. 13.]
So here I manage to have 2 pointers on the same array. But build an array from several memory emplacement seems to be much more difficult. If someone have an idea ...
Thanks for your help.
EDIT : My test 2 - A start of solution
Thanks a lot for your answers. The idea of a big global array is usefull because it is cheaper than the lazy solution of the double stockage with update of the copy at each modification of the copy. Moreover it is very simple to implement. The following performance testcode
import numpy as np
from time import time
def benchmark(d_size,s_len):
print("Interraction domain size: ",d_size," state length: ",s_len)
arr_combined = np.arange(s_len*d_size**2).reshape((d_size,d_size,s_len))
list_of_state = [ np.ndarray.view(arr_combined[i,j,:]) for i in range(d_size) for j in range(d_size) ]
interraction_between_state = np.ndarray.view( arr_combined[:,:,0] )
t0 = time()
for state in list_of_state : state[0]+=1
interraction_between_state*= -1
for state in list_of_state : state[0]+=1
print( "Exec time with np.ndarray.view: ", time()-t0 )
list_of_state2 = [ np.arange(s_len)+s_len*(j+d_size*i) for i in range(d_size) for j in range(d_size) ]
interraction_between_state2 = np.array( [ state[0] for state in list_of_state2 ] ).reshape((d_size,d_size))
t0 = time()
for state in list_of_state2 : state[0]+=1
### Update interraction_between_state2
for i in range(d_size):
for j in range(d_size):
interraction_between_state2[i,j]= list_of_state2[j+d_size*i][0]
interraction_between_state2*= -1
### Update list_of_state2
for i in range(d_size):
for j in range(d_size):
list_of_state2[j+d_size*i][0] = interraction_between_state2[i,j]
for state in list_of_state2 : state[0]+=1
print("Exec time with array update: ", time()-t0 )
benchmark(100,5)
benchmark(1000,5)
benchmark(100,50)
gives
Interraction domain size: 100 state length: 5
Exec time with np.ndarray.view: 0.00500103759765625
Exec time with array update: 0.011001110076904297
Interraction domain size: 1000 state length: 5
Exec time with np.ndarray.view: 0.5620560646057129
Exec time with array update: 1.1111111640930176
Interraction domain size: 100 state length: 50
Exec time with np.ndarray.view: 0.0060007572174072266
Exec time with array update: 0.01200103759765625
So, even if the view gives by np.ndarray.view generate a non memory contiguous data representation, the performance are better compared to the solution of the data duplication.
BUT
If the structure are not consistant (in term of dimension), how put them correctly in a global array ? For example if I go back to the first simple example with arr, arr1and arr2and I replace arr2 by
arr2 = np.array([3,4,5])
To build the global array, I think about 2 solutions :
- write the global array in a single line.
global_arr = np.array([1,2,3,4,5])
That lose some structure if arr1 and arr2 is two or 3 dimensional and that force to known in advance the correspondence of index between the global array and the local view. Moreover any subarray will be memory contiguous, so some supplementary test should be required to known the real gain of performance in a such case.
- keep the structure, and add
np.nanvalue in cells that is considered as void.
global_arr = np.array( [ [1,2,np.nan],
[3,4,5 ] ] )
Which are your opinions on these 2 solutions ? Is there another way to build cleverly this global array ?
aandbcan also be lists, etc.)? By the way, I think your solution is fine for this task.