1

essentially I'm having to do a lot of Matrix-Vector products, where the 3x3 matrices are stored as the last two entries of a (n x n x 3 x 3) numpy array and the vectors are the last column in a corresponging (n x n x 3) array.

Currently I'm looping quite inelegantly over the first two indices and calculating each matrix-vector product separately:

length=1000
x=np.random.rand(length,length,3)
A=np.random.rand(length,length,3,3)
result=np.zeros((length,length,3))

for i in range(0,length):
   for j in range(0,length):
      result[i,j,:]=A[i,j,:,:].dot(x[i,j,:])

This however is quite slow. Is there a more efficient way to avoid this i,j for loop? I'm still trying to learn the numpy methods. ;)

Also, in my real-life case, the matrices are very sparse (>0.999 sparsity), so bonus points for anyone who can give me some hints on how to incorporate that.

Thank you very much!

1
  • With that sparsity you might gain some speed (and better memory use) by making (3*n,3*n) scipy sparse matrices with the block format (with dense 3x3 blocks). Commented Nov 9, 2021 at 17:29

3 Answers 3

4

You could use np.einsum to perform the matrix/vector dot products over both i and j:

>>> np.einsum('ijkl,ijl->ijk', A, x)

>>> np.allclose(np.einsum('ijkl,ijl->ijk', A, x), result)
True
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for that! I used einsum in another part of the code and somehow didn't think to use it here.
I upvoted, your method is the faster one proposed
2

You can use broadcasting over the dot operator (@) by temporarily introducing an extra dimension in x:

result = (A @ x[:,:,:,None]).squeeze(axis=-1)

4 Comments

Thanks for the answer. :) Will this also need additional memory space or is this extra dimension (somewhat) fictional?
@NegativeJacobian It is entirely fictional and free - it simply reinterprets the data as it sits in memory differently.
Sticking with the slice notation, you can do (A @ x[:,:,:,None])[..., 0] ;)
Since my matrices are very sparse, even the fastest suggested way is significantly slower than my "naive" implementation when i restrict j to only cover the nonzero entries (e.g. the "neighbors" j actually connected to i). I guess I won't get around sparse matrix structures, but maybe I should open a separate question on that.
1

there are some answers to this question, I have a different method that a friend of mine taught me. I want to share it.

When you have a matrix-vector (or matrix-matrix) product suposing A NxMand x M you can make the product in numpy by reshaping x, doing the clasical product and then summing over the last dimension:

n=10
m=3
A = np.random.randn(n,m)
x = np.random.randn(m)

result1 = A.dot(x)
result2 = (A*x.reshape(1,-1)).sum(-1)
print(np.allclose(result1,result2))
#throws True

Using that you could resolve your problem for more bigger matrices:

import numpy as np

def method_1(A,x,n,m): #OP
    result=np.zeros((n,n,m))
    for i in range(0,n):
        for j in range(0,n):
            result[i,j,:]=A[i,j,:,:].dot(x[i,j,:])
    return result

def method_2(A,x,n,m): #Me 
    return (A* x.reshape(n,n,1,m)).sum(-1)

def method_3(A,x,n,m): #orlp
    return (A @ x[:,:,:,None]).squeeze(axis=-1)

def method_4(A,x,n,m): #ivan
    return np.einsum('ijkl,ijl->ijk', A, x)

n= 7
m = 3
A = np.random.randn(n,n,m,m)
x = np.random.randn(n,n,m)

r1 = method_1(A,x,n,m)
r2 = method_2(A,x,n,m)
r3 = method_3(A,x,n,m)
r4 = method_4(A,x,n,m)

r12 = np.allclose(r1,r2)
r13 = np.allclose(r1,r3)
r14 = np.allclose(r1,r4)

print(" same result?")
print(f"op-me: {r12}, op-orlp: {r13}, op-ivan: {r14}")

#same result?
#op-me: True, op-orlp: True, op-ivan: True

For last I did a comparision between all these method speed

from time import time
import matplotlib.pyplot as plt
nList = [10,100,1000,10000]

times_1 =[]
times_2 =[]
times_3 =[]
times_4 =[]
for n in nList:
    print(n)
    A = np.random.randn(n,n,m,m)
    x = np.random.randn(n,n,m)
    t1 = time()
    r1 = method_1(A,x,n,m)
    t2 = time()
    r2 = method_2(A,x,n,m)
    t3 = time()
    r3 = method_3(A,x,n,m)
    t4 = time()
    r4 = method_4(A,x,n,m)
    t5 = time()

    times_1.append(t2-t1)
    times_2.append(t3-t2)
    times_3.append(t4-t3)
    times_4.append(t5-t4)

#plot times
plt.figure()
plt.loglog(nList,times_1,'o-',label="op")
plt.loglog(nList,times_2,'o-',label="me")
plt.loglog(nList,times_3,'o-',label="orlp")
plt.loglog(nList,times_4,'o-',label="ivan")
plt.legend()

Here the results shows that the method proposed by Ivan is the faster one The results of time comparisions

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.