1
import numpy
from scipy.spatial.distance import pdist
X = numpy.zeros(50000,25)
C = pdist(X, 'euclidian')

I want to find:

And then numpy gives error : Array is too big.

I think problem is about array size of C. Pdist cannot creates (50000,50000) array. I dont know why numpy restricts? I can run same code in matlab. How can i run this code using array?

And also ,i found possible duplication but their array-matrix size too big.

Is it possible to create a 1million x 1 million matrix using numpy? Very large matrices using Python and NumPy

2
  • Have you done the math on how much memory you're trying to allocate? Commented Jul 23, 2013 at 9:11
  • Are you using a 64 bit version of python and numpy? A 50k x 50k array will use about 20Gb of memory (numpy uses double precison floating point by default). Commented Jul 23, 2013 at 9:12

1 Answer 1

1

first thing there are a couple of typos in your code. It's:

X = numpy.zeros((50000,25)) # it's a tuple going in
C = pdist(X, 'euclidean') # euclidean with an e

of course it does not matter for the question.

The Euclidean pdist is just a call for numpy.linalg.norm (http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html). It's a very general function. If it does not work in your case due to memory constraints you can always create something yourself. Two 50000 length vectors do not take that much memory and this can make one pairwise comparison:

np.sqrt(np.sum(np.square(X[0])) + np.sum(np.square(X[1])))

And then you only need to loop through the whole thing.

Hope it helps, P

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.