2

I'm working with a 2d array. Basically just trying to do an element wise addition of a constant value. Need to speed code up so attempted to use numpy array instead of list of list but finding numpy to be slower. Any idea of what I'm doing wrong? Thanks.

For example:

import time
import numpy as np

my_array_list = [[1,2,3],[4,5,6],[7,8,9]]
my_array_np = np.array(my_array_list)

n = 100000

s_np = time.time()
for a in range(n):
    for i in range(3):
        for j in range(3):
            my_array_np[i,j] = my_array_np[i,j] + 5
end_np = time.time() - s_np  

s_list = time.time()
for a in range(n):
    for i in range(3):
        for j in range(3):
            my_array_list[i][j] = my_array_list[i][j] + 5
end_list = time.time() - s_list 

print('my_array_np:', '\n', my_array_np, '\n')
print('my_array_list:', '\n',my_array_list, '\n')

print('time to complete with numpy:', end_np)
print('time to complete with list:', end_list)

Output:

my_array_np: 
 [[500001 500002 500003]
 [500004 500005 500006]
 [500007 500008 500009]] 

my_array_list: 
 [[500001, 500002, 500003], [500004, 500005, 500006], [500007, 500008, 500009]] 

time to complete with numpy: 0.7831366062164307
time to complete with list: 0.45527076721191406

Can see with this test using lists, the time to complete is significantly faster, ie, 0.45 vs 0.78 seconds. Should not numpy be significantly faster here?

6
  • 1
    You can try replacing the first set of for loops (over i and j) with my_array_np += 5 and recompute the benchmark. Commented Mar 24, 2020 at 3:03
  • 2
    You're not supposed to loop over an array manually. Looping over an array manually is like dragging your car behind you by hand - it's expected to be slow, because you're using it wrong. Commented Mar 24, 2020 at 3:04
  • Thanks @hilberts_drinking_problem. For the test case as presented here, yes, that would help. However, I actually need to visit each element check it(probably with if statement) then add some constant value based on if statement results. Commented Mar 24, 2020 at 3:19
  • 1
    Yes, iterating directly over the numpy.ndarray object and accessing items inside of it at the python interpreter level will always be slower than a list. Because numpy.ndarray objects improve speed by using their built-in, vectorized operations, that push computations down into the C layer. To work with it in the python interpreter layer, the objects have to additionally be "boxed" because usually you have primitive types underlying the array. So taht makes everything even slower. Commented Mar 24, 2020 at 5:51
  • 1
    There is a third-party library, numba which JIT-compiles this sort of code if it involves numpy.ndarray objects, and it is quite good. But just using plain numpy, this sort of approach is to be avoided. Learn the numpy way of doing things. Or just use a list. Commented Mar 24, 2020 at 5:53

2 Answers 2

2

Let's say you want to add something to all elements that are multiples of 3. Instead of iterating on all elements of the array, we would normally use a mask

In [355]: x = np.arange(12).reshape(3,4)                                                       
In [356]: mask = (x%3)==0                                                                      
In [357]: mask                                                                                 
Out[357]: 
array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]])
In [358]: x[mask] += 100                                                                       
In [359]: x                                                                                    
Out[359]: 
array([[100,   1,   2, 103],
       [  4,   5, 106,   7],
       [  8, 109,  10,  11]])

Many operations are ufunc, which have a where parameter

In [360]: x = np.arange(12).reshape(3,4)                                                       
In [361]: np.add(x,100, where=mask, out=x)                                                     
Out[361]: 
array([[100,   1,   2, 103],
       [  4,   5, 106,   7],
       [  8, 109,  10,  11]])

Fast numpy requires that we think in terms of the whole-array. The fast compiled code operates on arrays, or blocks of arrays. Python level iteration on arrays is slow, slower as you found out that iteration on lists. Accessing individual values of an array is more expensive.

For this small example, these whole-array methods are faster than the array iteration, though they are still slower than the list iteration. But the array methods scalar much better.

Sign up to request clarification or add additional context in comments.

Comments

0

emmmmm... It seems that list derivation is faster in the current case.But np faster when I add numba.

import dis
import time
import numpy as np
from numba import jit


my_array_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_array_np = np.array(my_array_list)

n = 1000000


# @jit
def fun1(my_array_np):
    # it is inplace option
    for a in range(n):
        my_array_np += 5


s_np = time.time()
fun1(my_array_np)
end_np = time.time() - s_np


def fuc2(my_array_list):
    for a in range(n):
        my_array_list = [[i + 5 for i in j] for j in my_array_list]
    return my_array_list


s_list = time.time()
my_array_list = fuc2(my_array_list)
end_list = time.time() - s_list

print('my_array_np:', '\n', my_array_np, '\n')
print('my_array_list:', '\n', my_array_list, '\n')

print('time to complete with numpy:', end_np)
print('time to complete with list:', end_list)

my_array_np: 
 [[500001 500002 500003]
 [500004 500005 500006]
 [500007 500008 500009]] 

my_array_list: 
 [[500001, 500002, 500003], [500004, 500005, 500006], [500007, 500008, 500009]] 


# use numba
time to complete with numpy: 0.27802205085754395
time to complete with list: 1.9161949157714844

# not use numba
time to complete with numpy: 3.4962515830993652
time to complete with list: 1.9761543273925781
[Finished in 3.4s]

1 Comment

By the way, if you uncomment the @jit, the current benchmark will pick up the compilation time. You can force the compilation by calling fun1(my_array_np.copy()) just above the s_np = time.time() line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.