Python shared pointer between numpy array

Question

I ask your help to solve the following problem. I wish have 3 numpy arrays build as the scheme :

       +---+
       | a |
arr1 : +---+      +---+
       | b |  <-  | b |  
       +---+      |   |
                  +---+  : arr
       +---+      |   |
       | c |  <-  | c | 
arr2 : +---+      +---+
       | d |
       +---+

So arr shared its values with other array.

arr1 and arr2 are classic numpy array defined by example :

arr1 = np.array([1,2]); arr2 = np.array([3,4])

Then, by which mean can I build arr such as to have the behavior

arr1[1] = 7  ### will give  arr1 = [1,7]   arr2 = [3,4]    arr = [7,3]
arr[1] = 13  ### will give  arr1 = [1,7]   arr2 = [13,4]   arr = [7,13]

?

My test

After several hours on web, ctypes features seems to be the solution to mimic C-pointers behavior in Python. I have try this :

import numpy as np
import ctypes

arr1 = np.array([1.0,0.2]).astype(np.float32)
arr2 = np.array([3.0,4.0]).astype(np.float32)
arr1_ptr = arr1.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
arr1bis = np.ctypeslib.as_array((ctypes.c_float * arr1.size).from_address(ctypes.addressof(arr1_ptr.contents)))
print(arr1bis)      ####   --> [ 1.   0.2]
arr1[0] = 7.0
print(arr1bis)      ####   --> [ 7.   0.2]
arr1bis[1] = 13.0
print(arr1)         ####   --> [ 7.   13.]

So here I manage to have 2 pointers on the same array. But build an array from several memory emplacement seems to be much more difficult. If someone have an idea ...

Thanks for your help.

EDIT : My test 2 - A start of solution

Thanks a lot for your answers. The idea of a big global array is usefull because it is cheaper than the lazy solution of the double stockage with update of the copy at each modification of the copy. Moreover it is very simple to implement. The following performance testcode

import numpy as np
from time import time

def benchmark(d_size,s_len):
      print("Interraction domain size: ",d_size,"   state length: ",s_len)

      arr_combined = np.arange(s_len*d_size**2).reshape((d_size,d_size,s_len))

      list_of_state = [ np.ndarray.view(arr_combined[i,j,:]) for i in range(d_size) for j in range(d_size) ]
      interraction_between_state = np.ndarray.view( arr_combined[:,:,0] )

      t0 = time()
      for state in list_of_state : state[0]+=1
      interraction_between_state*= -1
      for state in list_of_state : state[0]+=1
      print( "Exec time with np.ndarray.view: ", time()-t0 )

      list_of_state2 = [ np.arange(s_len)+s_len*(j+d_size*i) for i in range(d_size) for j in range(d_size) ]
      interraction_between_state2 = np.array( [ state[0] for state in list_of_state2 ] ).reshape((d_size,d_size))

      t0 = time()
      for state in list_of_state2 : state[0]+=1
      ### Update interraction_between_state2
      for i in range(d_size): 
          for j in range(d_size):
              interraction_between_state2[i,j]= list_of_state2[j+d_size*i][0]

      interraction_between_state2*= -1
      ### Update list_of_state2
      for i in range(d_size): 
          for j in range(d_size):
              list_of_state2[j+d_size*i][0] = interraction_between_state2[i,j]

      for state in list_of_state2 : state[0]+=1
      print("Exec time with array update:    ", time()-t0 )

benchmark(100,5)
benchmark(1000,5)
benchmark(100,50)

gives

Interraction domain size:  100    state length:  5
Exec time with np.ndarray.view:  0.00500103759765625
Exec time with array update:     0.011001110076904297
Interraction domain size:  1000    state length:  5
Exec time with np.ndarray.view:  0.5620560646057129
Exec time with array update:     1.1111111640930176
Interraction domain size:  100    state length:  50
Exec time with np.ndarray.view:  0.0060007572174072266
Exec time with array update:     0.01200103759765625

So, even if the view gives by np.ndarray.view generate a non memory contiguous data representation, the performance are better compared to the solution of the data duplication.

BUT

If the structure are not consistant (in term of dimension), how put them correctly in a global array ? For example if I go back to the first simple example with arr, arr1and arr2and I replace arr2 by

arr2 = np.array([3,4,5])

To build the global array, I think about 2 solutions :

write the global array in a single line.

global_arr = np.array([1,2,3,4,5])

That lose some structure if arr1 and arr2 is two or 3 dimensional and that force to known in advance the correspondence of index between the global array and the local view. Moreover any subarray will be memory contiguous, so some supplementary test should be required to known the real gain of performance in a such case.

keep the structure, and add np.nan value in cells that is considered as void.

global_arr = np.array( [ [1,2,np.nan], [3,4,5 ] ] )

Which are your opinions on these 2 solutions ? Is there another way to build cleverly this global array ?

Do the arrays only have two elements? Or are you looking for a more general solution (e.g. where a and b can also be lists, etc.)? By the way, I think your solution is fine for this task. — EsotericVoid
– EsotericVoid, Commented Aug 17, 2017 at 13:53
2 elements was for a simple example. Typically, for my using, such arrays have size between 100 and 1 million. I need to keep numpy structure for performance reason. — lg53
– lg53, Commented Aug 17, 2017 at 13:58
But performance will be (or rather would be) lost also for the numpy case, if the data is not contiguously laid out in memory. However, it is a moot point since numpy requires this. Thus, you must use some trick along the lines of what is presented in the answers. — JohanL
– JohanL, Commented Aug 17, 2017 at 14:07

JohanL · Accepted Answer · 2017-08-17 14:09:00Z

2

I think the simplest solution to your problem is to create one big array, containing all the data you want to operate on, then you can create views of the data any way you want to group it:

import numpy as np

arr_combined = np.array([1,2,3,4])
arr1 = np.ndarray.view(arr_combined[:2])
arr2 = np.ndarray.view(arr_combined[2:])
arr = np.ndarray.view(arr_combined[1:3])

This is not exactly what you are asking for but it will at least make it possible for you to access the data in the way you write in your example. Also, of course this is not limited to the small case of four elements, but could also be employed for your use case with many thousand elements.

edited Aug 17, 2017 at 14:09

answered Aug 17, 2017 at 14:02

JohanL

6,9211 gold badge16 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

lg53 Over a year ago

Thanks a lot for this answer. It is a good start of a solution, but bring other problem. My investigations are in the edit.

JohanL Over a year ago

OK, I think I understand that you have different layouts, of your matrices, but you need to clarify a bit. With your new edits, which parts are supposed to be shared between the two arrays? I don't know if there are any nice solutions to your problem, though.

lg53 Over a year ago

I have a spatial cartesian meshgrid (1 or 2 or 3 dimension) in which each cell contains a state (that we can represented by a 1D np.array). Several type of state are considered (3 or 4) and the number of value in a state depend on the type of this state. Moreover the type of state could be changed during the simulation. Some variables in states (the 10 firsts for example) are present regardless of the type of the state. I need to share this part of 10 variables of each states because these 10 variables interact with the corresponding variable in the neighbor states.

John Zwinck · Accepted Answer · 2017-08-17 14:01:02Z

1

You cannot create a NumPy array which "points to" multiple disjoint chunks of memory at arbitrary addresses. You can construct one big region (arr or a superset of it), and then "share" slices of that region with other smaller arrays.

In other words, turn the problem on its head, and construct the big array first, little arrays last. Then it's easy (either with the NumPy C API or by simply slicing the big array in Python).

If each small array were an exact multiple of one page (4 KB), you could probably play games with the virtual memory manager in some operating systems and remap them all to be contiguous, but that is so restrictive that it probably isn't practical.

answered Aug 17, 2017 at 14:01

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Collectives™ on Stack Overflow

Python shared pointer between numpy array

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related