Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

Question

I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:

def compare_array(final_array_list):
    base_array = None
    i = 0
    for array in final_array_list:
        if i==0:
            base_array =array[0]
        else:
            index = np.where(base_array != array)

            #getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
            # to access all unmatched values I need to loop.Need to avoid loop here

        i=i+1  
            
    return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]    
         
# similarly recreate array1 back
def recreate_array(array_list):
    # need to avoid looping while recreating array back
    return list of array #i.e. [base_array, array_1]       

# creating dummy array    
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
    
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index     
difff_array  = compare_array(final_array_list)

# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)

can you provide an explicit example? Mixing pseudo code and real code is bad practice btw — anon01
– anon01, Commented Oct 8, 2020 at 16:31

anon01 · Accepted Answer · 2020-10-08 20:31:38Z

1

I think this may be what you're looking for:

base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])

match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)

# idx_unmatched: 
#  array([[0, 1, 2],
#         [1, 1, 2]])

# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]

# values_unmatched: 
#  array([5, 9])

edited Oct 8, 2020 at 20:31

answered Oct 8, 2020 at 16:43

anon01

11.2k8 gold badges41 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yonatansc97 · Accepted Answer · 2020-10-08 16:43:39Z

I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).

I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)

For example:

If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.
The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.

To clarify: Here I instantiate two arrays, with b differing from a with -1 values: Note what a==b is, at the end.

>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
       [8, 4, 6, 7],
       [8, 4, 5, 5],
       [1, 7, 2, 5]])
>>> b
array([[ 9, -1,  3,  4],
       [ 8, -1,  6,  7],
       [ 8,  4,  5, -1],
       [ 1,  7,  2,  5]])
>>> a == b
array([[ True, False,  True,  True],
       [ True, False,  True,  True],
       [ True,  True,  True, False],
       [ True,  True,  True,  True]])

Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.

>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))

Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:

>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True

i have thousands of streaming arrays which are quite similar in nature... transporting such data over network stresses bandwidth. so I am only sending difference values and when another service receives it recreate the full array to process further.

Collectives™ on Stack Overflow

Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related