Numpy vectorization instead of for loop

Question

I wrote a function which is too time consuming when used with for loops. It appends numpy vectors (10,0) as rows in each iteration. How can I use a vectorized numpy solution for the iterations to speed this up?

Any hint why the vstack-array solution below is even slower than the append-list solution?

TIA

import numpy as np
import time


n_iterations = 1000
n_cols = 10

def sample_func():
    # Addition: please notice: the randon function is not important. It is only an example function. The real function is more complex and needs to replace for loops by a faster numpy solution.
    row = np.random.rand(0,n_cols)
    return row



#list solution: too slow

start_time_1 = time.time()

result_list = []  
for i in range(n_iterations):
    result_row = sample_func()
    result_list.append(np.sort(result_row))

print("Run time = {}".format(time.time() - start_time_1))
    

    
#array solution: too slow

start_time_2 = time.time()

result_array = np.empty([0,n_cols]) 

for i in range(n_iterations):
    result_row = sample_func()
    result_array = np.vstack([result_array, np.sort(result_row)])

print("Run time = {}".format(time.time() - start_time_2))

TIA

Avoid append in general. np.sort does allow sort along an axis. — Quang Hoang
– Quang Hoang, Commented Oct 30, 2020 at 13:23
list append adds a reference/pointer to an existing list. vstack makes a new array with full copy. Use it just once to join a whole list arrays, not incrementally. — hpaulj
– hpaulj, Commented Oct 30, 2020 at 15:01
As long as your function is written in Python and only takes one row at a time, there isn't much you can do. Especially if it complex, the spent evaluating that function many times will dominate. The iteration mechanism, such as list append, will be a relatively minor time consumer. — hpaulj
– hpaulj, Commented Oct 30, 2020 at 15:33

Daniel F · Accepted Answer · 2020-10-30 14:27:38Z

1

In general you don't want to append to numpy arrays. Re-allocating space for them is too time consuming. If you know n_iterations, you can allocate up-front like this:

result_array = np.empty([n_iterations, n_cols]) 

for i in range(n_iterations):
    result_array[i] = sample_func()

But you'll do much better "vectorizing" whatever is in sample_func to accept n-d input. for loops in python are slow. numpy gives you a lot of tricks to push your for loops into compiled c-code (called 'vectorizing'), but without knowing what's going on in the function we can't help you vectorize it.

edited Oct 30, 2020 at 14:27

answered Oct 30, 2020 at 14:16

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mathfux · Accepted Answer · 2020-10-30 13:20:00Z

0

np.random.rand() is designed to return output of any shape wanted - like many other numpy methods. So you don't need to concatenate list of the same lengths. Try:

r_numbers = np.random.rand(n_iterations, n_cols)

And to sort all the columns:

np.sort(r_numbers, axis=1)

answered Oct 30, 2020 at 13:20

mathfux

5,9792 gold badges20 silver badges38 bronze badges

2 Comments

greg2021 Over a year ago

Thanks a lot. I only tried to come up with the simplest possible function sample_func to formulate the question. My question is how to speed up the above solution considering it is for another more complex function which does not relate to random values. How can I replace for loops by a pure numpy solution?

mathfux Over a year ago

@greg2021 numpy is not good with it. Arrays are not allowed to have unstable length unlike Python lists (or other structures), try to find a way to vectorise it (I mean find a general rule to calculate all the rows). If you need to update a list constantly, I don't know a best way because they are all doomed to be slow. But you can try more things like np.concatenate, list comprehension, append and those you proposed. It seems like append is promising in most of cases.

Collectives™ on Stack Overflow

Numpy vectorization instead of for loop

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related