Converting large matrices to Sparse matrix in python

Question

I need to do the following

b = numpy.random.randn(50001,2)

cof = numpy.corrcoef(b)

c= b>=0.3 

return np.dot(c, np.ones([50001,1]))

It is throwing me a segmentation fault.

Also if I try to use a sparse matrix, for instance:

asp = scipy.sparse.csc_matrix(c)

I get a Segmentation fault

The conversion works if the matrix size is small.

Any advice?

Which line is it segfaulting on? (Is it the one with numpy.corrcoef?) — huon
– huon, Commented Jun 14, 2012 at 4:17
The problem is that the correlation matrix of your 50001x2 input will be 50001x50001. That is over 20Gb of ram, which I suspect you don't have. — talonmies
– talonmies, Commented Jun 14, 2012 at 8:58
I'm getting the correlation matrix back. It is giving me a segmentation fault when I try do a dot product with a vector of ones. This line should give me back the degree of each cordinate np.dot(c, np.ones([50001,1]), dtype= float) but instead I get a seg fault. I have server a which has 48 cores and 260 GB ram. However, I'm using just one core to run this. I'm not sure how much ram it is consuming for one core. — Ranjit s
– Ranjit s, Commented Jun 14, 2012 at 15:25
Is that 260 Gb shared or distributed? If distributed, 20 Gb is probably not available on a single core. Also, are you sure that your sysadmin has allowed you to use that much memory? There may be some memory limits in place (which can result to segv). — pv.
– pv., Commented Jun 14, 2012 at 16:02
Guessing here, but as a workaround, maybe it works if you just use np.sum(c, 1) (or np.sum(c,1)[:,None] to keep the shape)? As this matrix is probably not sparse, converting to sparse should not make sense? Just guessing that np.dot might do in the background causing problems. — seberg
– seberg, Commented Aug 15, 2012 at 14:54

Pierre GM · Accepted Answer · 2012-09-22 14:22:09Z

2

Are you trying to compute the correlation between two samples of a 50001 dimensional space or 50001 samples of a 2D space ?

In your current situation, you are creating an implicit 50001 x 50001 covariance matrix (which cause the segfault). Doing the following won't blow up the memory

b = numpy.random.randn(2,50001)
cof = numpy.corrcoef(b)

Hopefully this is what you need (The other way yields a really poor approximation of your covariance matrix and a segfault).

edited Sep 22, 2012 at 14:22

Pierre GM

20.5k3 gold badges58 silver badges67 bronze badges

answered Sep 17, 2012 at 13:00

recursix

3513 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Converting large matrices to Sparse matrix in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related