How to quickly fill a numpy array with values from separate calls to a function

Question

I want to fill a numpy array with generated values. These values are generated by a generator function. The array length is not too long, <100 usually, but this array is generated many times, so I wanted to know if it can be optimized with some fancy usage of numpy.

So far I can already do it with vanilla python:

def generate():
   return generated_data

array = np.asarray([generate() for _ in range(array_length)])

I've also tried to use np.full(shape, fill_value):

np.full((array_length, generated_data_size), generate())

But this is calls the generate() function only once, not once for every index in the array.

I've also tried np.vectorize(), but I couldn't make it generate a appropriately shaped array.

user2357112 · Accepted Answer · 2019-04-12 06:22:49Z

There is nothing NumPy can do to accelerate the process of repeatedly calling a function not designed to interact with NumPy.

The "fancy usage of numpy" way to optimize this is to manually rewrite your generate function to use NumPy operations to generate entire arrays of output instead of only supporting single values. That's how NumPy works, and how NumPy has to work; any solution that involves calling a Python function over and over again for every array cell is going to be limited by Python overhead. NumPy can only accelerate work that actually happens in NumPy.

If NumPy's provided operations are too limited to rewrite generate in terms of them, there are options like rewriting generate with Cython, or using @numba.jit on it. These mostly help with computations that involve complex dependencies from one loop iteration to the next; they don't help with external dependencies you can't rewrite.

If you cannot rewrite generate, all you can do is try to optimize the process of getting the return values into your array. Depending on array size, you may be able to save some time by reusing a single array object:

In [32]: %timeit x = numpy.array([random.random() for _ in range(10)])
The slowest run took 5.13 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 5: 5.44 µs per loop
In [33]: %%timeit x = numpy.empty(10)
   ....: for i in range(10):
   ....:     x[i] = random.random()
   ....: 
The slowest run took 4.26 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 5: 2.88 µs per loop

but the benefit vanishes for larger arrays:

In [34]: %timeit x = numpy.array([random.random() for _ in range(100)])
10000 loops, best of 5: 21.9 µs per loop
In [35]: %%timeit x = numpy.empty(100)
   ....: for i in range(100):
   ....:     x[i] = random.random()
   ....: 
10000 loops, best of 5: 22.8 µs per loop

Sigve Karolius · Accepted Answer · 2019-04-12 07:39:37Z

4

Conventional "Pythoninc"

List comprehension, or the map function could both be possible solutions for you:

from random import random
import numpy as np

np.array(list(map(lambda idx: random(), range(10))))
np.array([random() for idx in range(10)])

"Need-for-speed"

Maybe pre-alocating the memory will shave off a micro second or two(?)

array = np.empty(10)
for idx in range(10):
    array[idx] = random()

See Nathan's answer for an even better solution.

Function Vectorisation

A function can be "vectorised" using numpy:

def rnd(x):
    return random()

fun = np.vectorize(rnd)
array = fun(range(10))

edited Apr 12, 2019 at 7:39

answered Apr 11, 2019 at 10:54

Sigve Karolius

1,50613 silver badges29 bronze badges

6 Comments

Maxis Over a year ago

It should be np.array(list(map(lambda idx: random(), range(10)))) or it doesn't work. And you did understand it correctly, ill go and test it now to see if it's any faster.

Maxis Over a year ago

It's actually more about the cleanliness of the code. I was hoping there would be some elegant numpy function to do this. It's actually fast enough as is, but I'm just trying to learn some numpy tricks, that benefit both code readability and performance. (and using the list(map()) is actually slower)

Maxis Over a year ago

That last one is what I was trying with np.vectorize(). But it's extremely slow compared to the other ones. 1000000 array creations using [random() for _ in range(10)] is ~3.8 seconds, using list(map(lambda)) is ~4.6 seconds, and using np.vectorize() is ~58.5 seconds

user2357112 Over a year ago

numpy.vectorize is provided for convenience, not for speed; the returned function object executes far slower than code written in terms of "natively" vectorized NumPy operations.

user2357112 Over a year ago

It's going to be exceptionally slow the way you're using it, creating an entire dummy array (in a particularly inefficient manner, by implicit conversion from a range object) just to use it for ignored argument values, and introducing an extra layer of indirection just to ignore the dummy values. This is absolutely not a "Pythonic" way to write the code, and not a case where it makes sense to use numpy.vectorize.

|

Nathan Vērzemnieks · Accepted Answer · 2019-04-12 05:17:31Z

3

Another option would be to make a ufunc from your generate function:

gen_array = np.frompyfunc(generate, 0, 1) # takes 0 args, returns 1
array = gen_array(np.empty(array_length))

This is a bit faster for me than the "need for speed" version from Sigve's answer.

answered Apr 12, 2019 at 5:17

Nathan Vērzemnieks

5,6131 gold badge13 silver badges24 bronze badges

1 Comment

Maxis Over a year ago

Thanks, I like the solution, my generated values are sequences though, so it sadly doesn't work in my case.

Collectives™ on Stack Overflow

How to quickly fill a numpy array with values from separate calls to a function

3 Answers 3

Comments

Conventional "Pythoninc"

"Need-for-speed"

Function Vectorisation

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Conventional "Pythoninc"

"Need-for-speed"

Function Vectorisation

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related