1

I am trying to take the dot product of a sparse matrix and its transpose. I'm using the scipy.sparse library and finding that the results are incorrect. See below:

import numpy as np
import scipy.sparse 

#Define the dense matrix
matrix_dense = np.zeros([100000,10])
for i in range(10):
    i_0 = i*10000
    i_1 = (i+1)*10000
    matrix_dense[i_0:i_1,i] = 1

#Define the sparse matrix
cols = []
for i in range(10):
    cols+=[i]*10000

dtype = np.uint8 
rows = range(len(cols)) 
data_csc = np.ones(len(cols), dtype=dtype)
matrix_sparse = scipy.sparse.csc_matrix((data_csc, (rows, cols)), shape=(len(cols), 10), dtype=dtype)

#Check that the two matrices are identical
assert np.abs(matrix_sparse.todense() - matrix_dense).max() == 0 

#Dot product of the dense matrix
dense_product = np.dot(matrix_dense.T,matrix_dense)

#Dot product of the sparse matrix
sparse_product = (matrix_sparse.T)*(matrix_sparse)

The correct answer (given by dense_product) should be a diagonal matrix, where the diagonal terms equal 10,000.

print dense_product
[[ 10000.      0.      0.      0.      0.      0.      0.      0.      0.
   0.]
 [     0.  10000.      0.      0.      0.      0.      0.      0.      0.
   0.]
 [     0.      0.  10000.      0.      0.      0.      0.      0.      0.
   0.]
 [     0.      0.      0.  10000.      0.      0.      0.      0.      0.
   0.]
 [     0.      0.      0.      0.  10000.      0.      0.      0.      0.
   0.]
 [     0.      0.      0.      0.      0.  10000.      0.      0.      0.
   0.]
 [     0.      0.      0.      0.      0.      0.  10000.      0.      0.
   0.]
 [     0.      0.      0.      0.      0.      0.      0.  10000.      0.
   0.]
 [     0.      0.      0.      0.      0.      0.      0.      0.  10000.
   0.]
 [     0.      0.      0.      0.      0.      0.      0.      0.      0.
   10000.]]

However, no matter how I compute the sparse matrix, the result is incorrect:

print sparse_product.todense()
[[16  0  0  0  0  0  0  0  0  0]
 [ 0 16  0  0  0  0  0  0  0  0]
 [ 0  0 16  0  0  0  0  0  0  0]
 [ 0  0  0 16  0  0  0  0  0  0]
 [ 0  0  0  0 16  0  0  0  0  0]
 [ 0  0  0  0  0 16  0  0  0  0]
 [ 0  0  0  0  0  0 16  0  0  0]
 [ 0  0  0  0  0  0  0 16  0  0]
 [ 0  0  0  0  0  0  0  0 16  0]
 [ 0  0  0  0  0  0  0  0  0 16]]

I've tried differently ways of performing the sparse dot product and get the exact same answer:

sparse_product_1 = np.dot(matrix_sparse.T,matrix_sparse)
sparse_product_2 = (matrix_sparse.T).dot(matrix_sparse)
sparse_product_3 = scipy.sparse.csr_matrix.dot((matrix_sparse.T), 
matrix_sparse)

Any idea whats going on?

1
  • 1
    why are you using dtype = uint8? Commented Aug 22, 2017 at 19:49

1 Answer 1

2

It looks like you are using your data type of uint8, which has a max value of 256, and presumably you are overflowing, and ending up with 10000%256 which gives you 16.

Here's an example of what is happening:

x = np.array(10000, dtype = np.uint8)
x
array(16, dtype=uint8)

Changing your dtype to np.int64 works for me:

dtype = np.int64
Sign up to request clarification or add additional context in comments.

1 Comment

Doh - of course. I copied the code snippet on how to define the sparse matrix and clearly did not pay attention to the data type. Thank you for the quick response!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.