While the above answer is absolutely correct, I'd like to follow up on a bit more technical sort of answer - mostly because I was doing something very similar to the problem in your question last week, and learned some cool stuff along the way.
First of all, yes, matrix multiplications and vectorization is the right way to go. However, these can get a bit expensive when the matrices become large. Let me show a small benchmark for N=100 and M=100:
N,M = 100,100
A = np.random.randint(2,size=(N,M))
def type1():
A_c = 1-A
a = np.dot(A, A.T)
b = np.dot(A_c, A.T)
c = np.dot(A, A_c.T)
d = np.dot(A_c, A_c.T)
return a,b,c,d
%timeit -n 100 type1()
>>>3.76 ms ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
One easy speedup can be done by the fact that a+b+c+d = M. We don't actually need to find d; we can thus reduce one expensive dot product here!
def type2():
A_c = 1-A
a = np.dot(A, A.T)
b = np.dot(A_c, A.T)
c = np.dot(A, A_c.T)
return a,b,c,M-(a+b+c)
%timeit -n 100 type2()
>>>2.81 ms ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That shaved off almost a millisecond, but we can do even better. Numpy arrays come in two orders: C-Contiguous and F-Contiguous. You can check this by printing A.flags; A is a C-Contiguous array by default. However, its transpose A.T is represented as an F-Contiguous array, and when we pass them to dot, an internal copy is created for A.T since the ordering doesn't match.
One way to bypass this is by going over to scipy and hooking up our program with BLAS (https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms), particularly, the general matrix multiplication gemm routine.
from scipy.linalg import blas as B
def type3():
A_c = 1-A
a = B.dgemm(alpha=1.0, a=A, b=A, trans_b=True)
b = B.dgemm(alpha=1.0, a=A_c, b=A, trans_b=True)
c = B.dgemm(alpha=1.0, a=A, b=A_c, trans_b=True)
return a,b,c,M-(a+b+c)
%timeit -n 100 type3()
>>>449 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And the time has gone down directly from milliseconds to microseconds, which is pretty awesome.
Awas a list of lists (e.g.A.tolist()).A[j][k]indexing is expensive with arrays (compared to a list); try to avoid doing that repeatedly. Alternatively, tryfor rowin A: ...` instead of therangeand repeatedA[i]. etc.numpyusually comes from 'vectorizing', by which we mean, moving python level iteration into compiled numpy methods. That is using whole-array building blocks where possible. In this case, I'd focus on thekloop. I haven't tried to figure out what it's doing, so can't help directly. But try to think of ways of testing two rows, without that that iteration. things likerow1 == row2and boolean tests likerows1 | row2etc.