2

I need to do the following

b = numpy.random.randn(50001,2)

cof = numpy.corrcoef(b)

c= b>=0.3 

return np.dot(c, np.ones([50001,1]))

It is throwing me a segmentation fault.

Also if I try to use a sparse matrix, for instance:

asp = scipy.sparse.csc_matrix(c)

I get a Segmentation fault

The conversion works if the matrix size is small.

Any advice?

5
  • 1
    Which line is it segfaulting on? (Is it the one with numpy.corrcoef?) Commented Jun 14, 2012 at 4:17
  • 3
    The problem is that the correlation matrix of your 50001x2 input will be 50001x50001. That is over 20Gb of ram, which I suspect you don't have. Commented Jun 14, 2012 at 8:58
  • I'm getting the correlation matrix back. It is giving me a segmentation fault when I try do a dot product with a vector of ones. This line should give me back the degree of each cordinate np.dot(c, np.ones([50001,1]), dtype= float) but instead I get a seg fault. I have server a which has 48 cores and 260 GB ram. However, I'm using just one core to run this. I'm not sure how much ram it is consuming for one core. Commented Jun 14, 2012 at 15:25
  • 4
    Is that 260 Gb shared or distributed? If distributed, 20 Gb is probably not available on a single core. Also, are you sure that your sysadmin has allowed you to use that much memory? There may be some memory limits in place (which can result to segv). Commented Jun 14, 2012 at 16:02
  • 1
    Guessing here, but as a workaround, maybe it works if you just use np.sum(c, 1) (or np.sum(c,1)[:,None] to keep the shape)? As this matrix is probably not sparse, converting to sparse should not make sense? Just guessing that np.dot might do in the background causing problems. Commented Aug 15, 2012 at 14:54

1 Answer 1

2

Are you trying to compute the correlation between two samples of a 50001 dimensional space or 50001 samples of a 2D space ?

In your current situation, you are creating an implicit 50001 x 50001 covariance matrix (which cause the segfault). Doing the following won't blow up the memory

b = numpy.random.randn(2,50001)
cof = numpy.corrcoef(b)

Hopefully this is what you need (The other way yields a really poor approximation of your covariance matrix and a segfault).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.