0

I have a bottleneck in my code which I am struggling with.

Take an array A, of size (N x M), only containing 1s and 0s. I need an algorithm which takes all combinations of two rows of A and counts the overlaps between them.

More specifically, I need a faster alternative to the following algorithm:

for i in range(A.shape[0]):
  for j in range(A.shape[0]):
    a=b=c=d=0
    for k in range(A.shape[1]):
     if A[i][k]==1 and A[j][k]==1:
       a+=1
     if A[i][k]==0 and A[j][k]==0:
       b+=1
     if A[i][k]==1 and A[j][k]==0:
       c+=1
     if A[i][k]==0 and A[j][k]==1:
       d+=1
    print(a,b,c,d)

Thanks for any replies!

4
  • As written this would be faster if A was a list of lists (e.g. A.tolist()). A[j][k] indexing is expensive with arrays (compared to a list); try to avoid doing that repeatedly. Alternatively, try for row in A: ...` instead of the range and repeated A[i]. etc. Commented May 10, 2020 at 15:31
  • Could you give an example of what do you mean by "overlap"? Commented May 10, 2020 at 15:35
  • The code explains exactly what I mean. It is hard to do so in words. Take two rows of A say [1,0,0,1] and [0,1,0,1]. Then glue them along axis zero to give [[1,0,0,1], [0,1,0,1]]. I then want to calculate the number of different columns, i.e. the number of (1,1), (0,0), (1,0) and (0,1). Commented May 10, 2020 at 15:38
  • Speed in numpy usually comes from 'vectorizing', by which we mean, moving python level iteration into compiled numpy methods. That is using whole-array building blocks where possible. In this case, I'd focus on the k loop. I haven't tried to figure out what it's doing, so can't help directly. But try to think of ways of testing two rows, without that that iteration. things like row1 == row2 and boolean tests like rows1 | row2 etc. Commented May 10, 2020 at 15:47

3 Answers 3

2

Since a, b, c, d are in the loop I assume you want them per each i, j. I am going to make a matrix for them with element in [i, j] be the corresponding value for a, b, c, d in your loop i, j, without ANY loops. For example a[i,j] is your value of a in the loop i,j:

A_c = 1-A
a = np.dot(A, A.T)
b = np.dot(A_c, A.T)
c = np.dot(A, A_c.T)
d = np.dot(A_c, A_c.T)

If you care even more about speed, you can factorize and shorten/reuse some of the calculations in the equations above.

Sign up to request clarification or add additional context in comments.

2 Comments

Very elegant solution. Thank you!
@Randomgenerator You are welcome. I was going to add a faster/more efficient calculations but Paul's answer below also includes more of what I meant by shortening the solution. It is not the only unique solution, you can also do other factorizations (with probably similar efficiencies).
0

While the above answer is absolutely correct, I'd like to follow up on a bit more technical sort of answer - mostly because I was doing something very similar to the problem in your question last week, and learned some cool stuff along the way.

First of all, yes, matrix multiplications and vectorization is the right way to go. However, these can get a bit expensive when the matrices become large. Let me show a small benchmark for N=100 and M=100:

N,M = 100,100
A = np.random.randint(2,size=(N,M))

def type1():
    A_c = 1-A
    a = np.dot(A, A.T)
    b = np.dot(A_c, A.T)
    c = np.dot(A, A_c.T)
    d = np.dot(A_c, A_c.T)
    return a,b,c,d

%timeit -n 100 type1()

>>>3.76 ms ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

One easy speedup can be done by the fact that a+b+c+d = M. We don't actually need to find d; we can thus reduce one expensive dot product here!

def type2():
    A_c = 1-A
    a = np.dot(A, A.T)
    b = np.dot(A_c, A.T)
    c = np.dot(A, A_c.T)
    return a,b,c,M-(a+b+c)

%timeit -n 100 type2()
>>>2.81 ms ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That shaved off almost a millisecond, but we can do even better. Numpy arrays come in two orders: C-Contiguous and F-Contiguous. You can check this by printing A.flags; A is a C-Contiguous array by default. However, its transpose A.T is represented as an F-Contiguous array, and when we pass them to dot, an internal copy is created for A.T since the ordering doesn't match.

One way to bypass this is by going over to scipy and hooking up our program with BLAS (https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms), particularly, the general matrix multiplication gemm routine.

from scipy.linalg import blas as B

def type3():
    A_c = 1-A
    a = B.dgemm(alpha=1.0, a=A, b=A, trans_b=True)
    b = B.dgemm(alpha=1.0, a=A_c, b=A, trans_b=True)
    c = B.dgemm(alpha=1.0, a=A, b=A_c, trans_b=True)
    return a,b,c,M-(a+b+c)

%timeit -n 100 type3()
>>>449 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And the time has gone down directly from milliseconds to microseconds, which is pretty awesome.

Comments

0

Just for the sport of it here is a method that is three times faster than the current fastest. We take advantage of matrix computations being faster on float, in particular float32. Further we are doing only one matrix multiplication, inferring the other numbers by much cheaper methods:

def pp():
    A1 = np.count_nonzero(A,1)
    Af = A.astype('f4')
    a = [email protected]
    b = A1-a
    c = b.T
    d = M-a-b-c
    return a,b,c,d


[*map(np.array_equal,pp(),type3())]
# [True, True, True, True]

timeit(pp,number=1000)
# 0.14910832402529195
timeit(type3,number=1000)
# 0.4432948770117946

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.