0

My goal is to create an array where each elemet is normal(size={})) of each element of it.

I am trying to oprimize:

it = 2 ** arange(6, 25)
M = zeros(len(it))
for x in range(len(it)):
    M[x] = (normal(size=it[x]))

I have these not working so far:

N = zeros(len(it))
it = 2 ** arange(6, 25)
N = (normal(size=it))

Further I tried:

N = (normal(size=it[:]))

Provided my data, I believe that such a manual work, or for loop is really inefficient, so I am trying to come up with vectorized operations.

i receive:

File "mtrand.pyx", line 1335, in numpy.random.mtrand.RandomState.normal
  File "common.pyx", line 557, in numpy.random.common.cont
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
1
  • Maybe it helps if you change 25 to a smaller number, e.g. 10 Commented Oct 22, 2020 at 7:35

1 Answer 1

1

you've not been very precise in where these functions are coming from, but I'm guessing that by normal(size=it[:]) you mean:

import numpy as np
it = 2 ** np.arange(6, 25)
np.random.normal(size=it)

which would be telling numpy to create a 19 dimensional array (i.e. len(it)) that contains 6 × 1085 elements (i.e. np.prod(it.astype(float)) as floats because the number overflows an int64). numpy is saying that it can't do that, which seems like a reasonable thing to do.

Numpy doesn't like the "ragged arrays" you're trying to create, neither do most matrix/numeric libraries, hence support is limited!

I'm unsure why you consider that the "loop is really inefficient". You're creating ~33 million of floats from 19 iterations of a simple Python loop. The vast majority of time will be in highly optimised Numpy library code and some tiny (basically unmeasurable) amount of time will be spent evaluating your Python bytecode.

If you really want a one-liner then you can do:

X = [np.random.normal(size=2**i) for i in range(6, 25)]

which makes the split between Numpy and Python worlds more obvious.

Note that on my laptop, the Python code executes in ~5µs while the Numpy code runs for ~800ms. So you're trying to optimise the 0.0006% part!

Note that it's not always a win to use Numpy's vectorization, it only helps with larger arrays, for example the above loop is "faster" than:

X = [np.random.normal(i) for i in 2**np.arange(6, 25)]

4.8 vs 5.1 µs for the Python code, because of the time spent marshalling objects into/out of the Numpy world. Again, none of this matters, just use whichever solution makes your code easier to understand. A few microseconds is nothing compared to seconds.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, my actual task requires: for x in range(len(it)): M[x] = sum(normal(size=it[x])) Which I know changes the question, but still might be relevant to ask.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.