0

for my class I need to write more optimized math function using NumPy. Problem is, when using NumPy my solutions are slower when native Python.

  1. function which cubes all the elements of an array and sum them

Python:

def cube(x):
    result = 0
    for i in range(len(x)):
        result += x[i] ** 3
    return result

My, using NumPy (15-30% slower):

def cube(x):
    it = numpy.nditer([x, None])
    for a, b in it:
        b[...] = a*a*a
    return numpy.sum(it.operands[1])
  1. Some random calculation function

Python:

def calc(x):
    m = sum(x) / len(x)
    result = 0

    for i in range(len(x)):
        result += (x[i] - m)**4

    return result / len(x)

NumPy (>10x slower):

def calc(x):
    m = numpy.mean(x)
    result = 0
    for i in range(len(x)):
        result += numpy.power((x[i] - m), 4)
    return result / len(x)

I don't know how to approatch this, so far I have tried random functions from NumPy

2
  • 2
    I don't think you are grasping how to use numpy. For example, your first numpy solution should be (x**3).sum() Commented Feb 3, 2021 at 22:30
  • 1
    numpy is fast, because it can make use of vectorized function calls; you are creating massive overhead with what you are doing and I'm surprised it's actually still only 30% slower. On my machine, numpy is 10x faster even with small arrays with size ~10000, when you are using it correctly, as @user3483203 described Commented Feb 3, 2021 at 22:53

2 Answers 2

4

To elaborate on what has been said in comments:

Numpy's power comes from being able to do all the looping in fast c/fortran rather than slow Python looping. For example, if you have an array x and you want to calculate the square of every value in that array, you could do

y = []
for value in x:
    y.append(value**2)

or even (with a list comprehension)

y = [value**2 for value in x]

but it will be much faster if you can do all the looping inside numpy with

y = x**2

(assuming x is already a numpy array).

So for your examples, the proper way to do it in numpy would be

1.

def sum_of_cubes(x):
    result = 0
    for i in range(len(x)):
        result += x[i] ** 3
    return result

def sum_of_cubes_numpy(x):
    return (x**3).sum()
def calc(x):
    m = sum(x) / len(x)
    result = 0

    for i in range(len(x)):
        result += (x[i] - m)**4

    return result / len(x)

def calc_numpy(x):
    m = numpy.mean(x)  # or just x.mean()
    return numpy.sum((x - m)**4) / len(x)

Note that I've assumed that the input x is already a numpy array, not a regular Python list: if you have a list lst, you can create an array from it with arr = numpy.array(lst).

Sign up to request clarification or add additional context in comments.

Comments

0
In [337]: def cube(x):
     ...:     result = 0
     ...:     for i in range(len(x)):
     ...:         result += x[i] ** 3
     ...:     return result
     ...: 

nditer is not a good numpy iterator, at least not when used in Python level code. It's really just a stepping stone toward writing compiled code. It's docs need a better disclaimer.

In [338]: def cube1(x):
     ...:     it = numpy.nditer([x, None])
     ...:     for a, b in it:
     ...:         b[...] = a*a*a
     ...:     return numpy.sum(it.operands[1])
     ...: 
In [339]: cube(list(range(10)))
Out[339]: 2025
In [340]: cube1(list(range(10)))
Out[340]: 2025
In [341]: cube1(np.arange(10))
Out[341]: 2025

A more direct numpy iteration:

In [342]: def cube2(x):
     ...:     it = [a*a*a for a in x]
     ...:     return numpy.sum(it)
     ...: 

The better whole-array code. Just as sum can work with the whole array, the power also applies the whole.

In [343]: def cube3(x):
     ...:     return numpy.sum(x**3)
     ...: 
In [344]: cube2(np.arange(10))
Out[344]: 2025
In [345]: cube3(np.arange(10))
Out[345]: 2025

Doing some timings:

The list reference:

In [346]: timeit cube(list(range(1000)))
438 µs ± 9.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The slow nditer:

In [348]: timeit cube1(np.arange(1000))
2.8 ms ± 5.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The partial numpy:

In [349]: timeit cube2(np.arange(1000))
520 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I can improve its time by passing a list instead of an array. Iteration on lists is faster.

In [352]: timeit cube2(list(range(1000)))
229 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

But the time for a 'pure' numpy version blows all of those out of the water:

In [350]: timeit cube3(np.arange(1000))
23.6 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The general rule is that numpy methods applied to a numpy array are fastest. But if you must loop, it's usually better to use lists.

Sometimes the pure numpy approach creates very large temporary array. Then memory management complexities can reduce performance. In such cases a modest of number of iterations on a complex task may be best.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.