0

I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:

def compare_array(final_array_list):
    base_array = None
    i = 0
    for array in final_array_list:
        if i==0:
            base_array =array[0]
        else:
            index = np.where(base_array != array)

            #getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
            # to access all unmatched values I need to loop.Need to avoid loop here

        i=i+1  
            
    return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]    
         
# similarly recreate array1 back
def recreate_array(array_list):
    # need to avoid looping while recreating array back
    return list of array #i.e. [base_array, array_1]       

# creating dummy array    
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
    
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index     
difff_array  = compare_array(final_array_list)

# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)
1
  • can you provide an explicit example? Mixing pseudo code and real code is bad practice btw Commented Oct 8, 2020 at 16:31

2 Answers 2

1

I think this may be what you're looking for:

base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])

match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)

# idx_unmatched: 
#  array([[0, 1, 2],
#         [1, 1, 2]])

# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]

# values_unmatched: 
#  array([5, 9])
Sign up to request clarification or add additional context in comments.

Comments

0

I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).

I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)

For example:

  • If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.

  • The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.

To clarify: Here I instantiate two arrays, with b differing from a with -1 values: Note what a==b is, at the end.

>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
       [8, 4, 6, 7],
       [8, 4, 5, 5],
       [1, 7, 2, 5]])
>>> b
array([[ 9, -1,  3,  4],
       [ 8, -1,  6,  7],
       [ 8,  4,  5, -1],
       [ 1,  7,  2,  5]])
>>> a == b
array([[ True, False,  True,  True],
       [ True, False,  True,  True],
       [ True,  True,  True, False],
       [ True,  True,  True,  True]])

Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.

>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))

Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:

>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True

1 Comment

i have thousands of streaming arrays which are quite similar in nature... transporting such data over network stresses bandwidth. so I am only sending difference values and when another service receives it recreate the full array to process further.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.