avoid for loop in python normal(size={}))

Question

My goal is to create an array where each elemet is normal(size={})) of each element of it.

I am trying to oprimize:

it = 2 ** arange(6, 25)
M = zeros(len(it))
for x in range(len(it)):
    M[x] = (normal(size=it[x]))

I have these not working so far:

N = zeros(len(it))
it = 2 ** arange(6, 25)
N = (normal(size=it))

Further I tried:

N = (normal(size=it[:]))

Provided my data, I believe that such a manual work, or for loop is really inefficient, so I am trying to come up with vectorized operations.

i receive:

File "mtrand.pyx", line 1335, in numpy.random.mtrand.RandomState.normal
  File "common.pyx", line 557, in numpy.random.common.cont
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

Maybe it helps if you change 25 to a smaller number, e.g. 10 — Frank Tap
– Frank Tap, Commented Oct 22, 2020 at 7:35

Sam Mason · Accepted Answer · 2020-10-22 11:58:40Z

you've not been very precise in where these functions are coming from, but I'm guessing that by normal(size=it[:]) you mean:

import numpy as np
it = 2 ** np.arange(6, 25)
np.random.normal(size=it)

which would be telling numpy to create a 19 dimensional array (i.e. len(it)) that contains 6 × 10⁸⁵ elements (i.e. np.prod(it.astype(float)) as floats because the number overflows an int64). numpy is saying that it can't do that, which seems like a reasonable thing to do.

Numpy doesn't like the "ragged arrays" you're trying to create, neither do most matrix/numeric libraries, hence support is limited!

I'm unsure why you consider that the "loop is really inefficient". You're creating ~33 million of floats from 19 iterations of a simple Python loop. The vast majority of time will be in highly optimised Numpy library code and some tiny (basically unmeasurable) amount of time will be spent evaluating your Python bytecode.

If you really want a one-liner then you can do:

X = [np.random.normal(size=2**i) for i in range(6, 25)]

which makes the split between Numpy and Python worlds more obvious.

Note that on my laptop, the Python code executes in ~5µs while the Numpy code runs for ~800ms. So you're trying to optimise the 0.0006% part!

Note that it's not always a win to use Numpy's vectorization, it only helps with larger arrays, for example the above loop is "faster" than:

X = [np.random.normal(i) for i in 2**np.arange(6, 25)]

4.8 vs 5.1 µs for the Python code, because of the time spent marshalling objects into/out of the Numpy world. Again, none of this matters, just use whichever solution makes your code easier to understand. A few microseconds is nothing compared to seconds.

Thanks, my actual task requires: for x in range(len(it)): M[x] = sum(normal(size=it[x])) Which I know changes the question, but still might be relevant to ask.

Collectives™ on Stack Overflow

avoid for loop in python normal(size={}))

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related