5

All Numpy-experts, this is probably pretty straight forward for you guys. This question should exists, but I did not found something exact solving it. Something similar was Comparing two matrices row-wise by occurrence in NumPy and Numpy compare array to multiple scalars at once but not exactly there.

I need to compute numpy.array_equal for a multidimensional array but I'm pretty sure I don't need to use double for-loops. However, if I would compute using double for-loops, it would look as following:

M = numpy.array(
     [
         [
            [1,2,3],
            [1,3,4]
         ],
         [
            [3,4,5],
            [1,2,3]
         ],
         [
            [1,2,3],
            [1,3,4]
         ]
     ])

result = np.zeros((M.shape[0], M.shape[0]))
for i in range(M.shape[0]):
    for j in range(M.shape[0]):
        result[i,j] = numpy.array_equal(M[i], M[j])

I should end up with a M.shape[0]^2 large truth table, where at least the diagonal is true.

1 Answer 1

2

Leverage broadcasting after extending - the input to two 4D versions such that we can compare the pairiwise-array-blocks against each other along the first axis while keeping the last two axes aligned -

result = (M[:,None] == M).all((2,3))

We can extend this to generic n-dim array case using last two axes as input to .all() -

(M[:,None] == M).all((-2,-1))

Leverage views to have a more memory efficient and hence performant one -

# https://stackoverflow.com/a/44999009/ @Divakar
def view1D(a): # a is array
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel()

M1D = view1D(M.reshape(M.shape[0],-1))
result = M1D[:,None] == M1D

Timings on large array -

In [48]: np.random.seed(0)
    ...: M = np.random.randint(0,10,(100,100,100))

In [49]: %timeit (M[:,None] == M).all((-2,-1))
10 loops, best of 3: 92.2 ms per loop

In [50]: %%timeit 
    ...: M1D = view1D(M.reshape(M.shape[0],-1))
    ...: M1D[:,None] == M1D
1000 loops, best of 3: 627 µs per loop

Original one -

In [54]: %%timeit
    ...: result = np.zeros((M.shape[0], M.shape[0]))
    ...: for i in range(M.shape[0]):
    ...:     for j in range(M.shape[0]):
    ...:         result[i,j] = numpy.array_equal(M[i], M[j])
10 loops, best of 3: 125 ms per loop

The conclusion would be - Remove loops, but watch out for memory usage. If possible, find other ways that keeps it vectorized and memory efficient.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.