In [337]: def cube(x):
...: result = 0
...: for i in range(len(x)):
...: result += x[i] ** 3
...: return result
...:
nditer is not a good numpy iterator, at least not when used in Python level code. It's really just a stepping stone toward writing compiled code. It's docs need a better disclaimer.
In [338]: def cube1(x):
...: it = numpy.nditer([x, None])
...: for a, b in it:
...: b[...] = a*a*a
...: return numpy.sum(it.operands[1])
...:
In [339]: cube(list(range(10)))
Out[339]: 2025
In [340]: cube1(list(range(10)))
Out[340]: 2025
In [341]: cube1(np.arange(10))
Out[341]: 2025
A more direct numpy iteration:
In [342]: def cube2(x):
...: it = [a*a*a for a in x]
...: return numpy.sum(it)
...:
The better whole-array code. Just as sum can work with the whole array, the power also applies the whole.
In [343]: def cube3(x):
...: return numpy.sum(x**3)
...:
In [344]: cube2(np.arange(10))
Out[344]: 2025
In [345]: cube3(np.arange(10))
Out[345]: 2025
Doing some timings:
The list reference:
In [346]: timeit cube(list(range(1000)))
438 µs ± 9.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The slow nditer:
In [348]: timeit cube1(np.arange(1000))
2.8 ms ± 5.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The partial numpy:
In [349]: timeit cube2(np.arange(1000))
520 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I can improve its time by passing a list instead of an array. Iteration on lists is faster.
In [352]: timeit cube2(list(range(1000)))
229 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
But the time for a 'pure' numpy version blows all of those out of the water:
In [350]: timeit cube3(np.arange(1000))
23.6 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The general rule is that numpy methods applied to a numpy array are fastest. But if you must loop, it's usually better to use lists.
Sometimes the pure numpy approach creates very large temporary array. Then memory management complexities can reduce performance. In such cases a modest of number of iterations on a complex task may be best.
numpy. For example, your firstnumpysolution should be(x**3).sum()numpyis fast, because it can make use of vectorized function calls; you are creating massive overhead with what you are doing and I'm surprised it's actually still only 30% slower. On my machine,numpyis 10x faster even with small arrays with size ~10000, when you are using it correctly, as @user3483203 described