5

I have n matrices of the same size and want to see how many cells are equal to each other across all matrices. Code:

import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])

#Intuition is below but is wrong
a == b == c

How do I get Python to return a value of 2 (cells 2,1 and 2,3 match in all 3 matrices) or an array of [[False, False, False], [True, False, True], [False, False, False]]?

0

3 Answers 3

4

You can do:

(a == b) & (b==c)

[[False False False]
 [ True False  True]
 [False False False]]

For n items in, say, a list like x=[a, b, c, a, b, c], one could do:

r = x[0] == x[1]
for temp in x[2:]:
    r &= x[0]==temp

The result in now in r.

If the structure is already in a 3D numpy array, one could also use:

np.amax(x,axis=2)==np.amin(x,axis=2)

The idea for the above line is that although it would be ideal to have an equal function with an axis argument, there isn't one so this line notes that if amin==amax along the axis, then all elements are equal.


If the different arrays to be compared aren't already in a 3D numpy array (or won't be in the future), looping the list is a fast and easy approach. Although I generally agree with avoiding Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop since the loop is only along a single axis and it's easy to accumulate the comparisons in place. Here's a timing test:

def f0(x):
    r = x[0] == x[1]
    for y in x[2:]:
        r &= x[0]==y

def f1(x):  # from @Divakar
    r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)

def f2(x):
    x = np.dstack(x)
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
    x = [np.ones((size, size)) for i in range(n)]
    print n, size, reps
    print "f0: ",
    print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
    print "f1: ",
    print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
    print

1000 3 1000
f0:  1.14673900604  # loop
f1:  3.93413209915  # diff
f2:  3.93126702309  # min max

10 1000 100
f0:  2.42633581161  # loop
f1:  27.1066679955  # diff
f2:  25.9518558979  # min max

If arrays are already in a single 3D numpy array (eg, from using x = np.dstack(x) in the above) then modifying the above function defs appropriately and with the addition of the min==max approach gives:

def g0(x):
    r = x[:,:,0] == x[:,:,1]
    for iy in range(x[:,:,2:].shape[2]):
        r &= x[:,:,0]==x[:,:,iy]

def g1(x):   # from @Divakar
    r = ~np.any(np.diff(x,axis=2),axis=2)

def g2(x):
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

which yields:

1000 3 1000
g0:  3.9761030674      # loop
g1:  0.0599548816681   # diff
g2:  0.0313589572906   # min max

10 1000 100
g0:  10.7617051601     # loop
g1:  10.881870985      # diff
g2:  9.66712999344     # min max

Note also that for a list of large arrays f0 = 2.4 and for a pre-built array g0, g1, g2 ~= 10., so that if the input arrays are large, than fastest approach by about 4x is to store them separately in a list. I find this a bit surprising and guess that this might be due to cache swapping (or bad code?), but I'm not sure anyone really cares so I'll stop this here.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a generalized answer? If I have more than 3 matrices, the combinations and steps necessary to arrive at an answer will explode.
@user1842834: I should also note that the "combinations" don't "explode". There are no combinations and explosions and this is just O(n). Think of it as a single element array, say, the number 5. Then you have a list of numbers and you go along the list asking whether each of the elements is equal to 5. It's just that but for a matrix. There's no need to compare every element to every other.
4

Concatenate along the third axis with np.dstack and perfom differentiation with np.diff, so that the identical ones would show up as zeros. Then, check for cases where all are zeros with ~np.any. Thus, you would have a one-liner solution like so -

~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)

Sample run -

In [39]: a
Out[39]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [40]: b
Out[40]: 
array([[5, 6, 7],
       [4, 2, 6],
       [7, 8, 9]])

In [41]: c
Out[41]: 
array([[2, 3, 4],
       [4, 5, 6],
       [1, 2, 5]])

In [42]: ~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Out[42]: 
array([[False, False, False],
       [ True, False,  True],
       [False, False, False]], dtype=bool)

Comments

1

Try this:

z1 = a == b
z2 = a == c
z  = np.logical_and(z1,z2)
print "count:", np.sum(z)

You can do this in a single statement:

count = np.sum( np.logical_and(a == b, a == c) )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.