0

Hi I have a simple line that creates a random array for a rather large dataset:

import numpy as np
import random
N=276233
L=138116

np.random.random([L,N])

But i get this error:

Traceback (most recent call last):
  File "<string>", line 3 (23), in <module>
  File "mtrand.pyx", line 760, in mtrand.RandomState.random_sample (numpy\random\mtrand\mtrand.c:5713)
  File "mtrand.pyx", line 137, in mtrand.cont0_array (numpy\random\mtrand\mtrand.c:1300)
MemoryError

What is the solution and what is the limit of the array ?

1
  • 1
    If you can use a smaller integer type rather than doubles you could reduce the memory foot print by quite a bit. However, depending on the goals of your analysis / data this may not be possible. Commented Jan 19, 2015 at 20:40

1 Answer 1

9

You are trying to create an array that would require 284GB of memory:

In [16]: L * N * 8 / (1024. ** 3)
Out[16]: 284.25601890683174

Either buy a lot more RAM (and make sure your system can handle it) or find a way to not have to generate a 276,233x138,116 matrix.

Sign up to request clarification or add additional context in comments.

5 Comments

Hmm...How did you get the number 284GB ? I have N*L=3.81x10^10 bits / 8 bits/Byte = 4.7GB ? Am i wrong?
276233 * 138116 * 8 / (1024 ^ 3). Each double uses 8 bytes.
Ah okay thanks! I thought each they're a bit each. :( I guess I can't use this method for large dataset which requires a random matrix ...
@Arbitel: Are you sure you need to generate the entire matrix at once?
It goes back to this question I posted earlier : stackoverflow.com/questions/28015281/numpy-optimization I basically had to compare the neighbouring elements. After that I would have to do a row-wise summation. I guess I have to?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.