Numpy performance differences depending on numerical values

Question

I found a strange performance difference while evaluating an expression in Numpy.

I executed the following code:

import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])

and then

%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

and

%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.

What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?

OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms. — jpp
– jpp, Commented Nov 21, 2018 at 15:53
In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers — Brenlla
– Brenlla, Commented Nov 21, 2018 at 15:55
@MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000) — Brenlla
– Brenlla, Commented Nov 21, 2018 at 16:23
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow... — Marco13
– Marco13, Commented Nov 21, 2018 at 19:53
@Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746). — Warren Weckesser
– Warren Weckesser, Commented Nov 21, 2018 at 20:48

Community · Accepted Answer · 2020-06-20 09:12:55Z

Use Intel SVML

I have no working numexpr with Intel SVML, but numexpr with working SVML should perform as good as Numba. The Numba Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.

Code

import numpy as np
import numba as nb

myarr = np.random.uniform(-1,1,[1100,1100])

@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
  return np.exp( - 0.5 * (myarr / div)**2 )

Timings

#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1)                      -> 3.6ms
func(myarr,0.001)                    -> 3.8ms

#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1)                      -> 5.19ms
func(myarr,0.001)                    -> 12.0ms

#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1)                      -> 16.7ms
func(myarr,0.001)                    -> 63.2ms

#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 )   -> 12.58ms

#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 )   -> 17.4ms

#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )")   ->4.38ms

#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )")   ->13.9ms

Maxim Egorushkin · Accepted Answer · 2018-11-24 19:09:12Z

1

This may produce denormalised numbers which slow down computations.

You may like to disable denormalized numbers using daz library:

import daz
daz.set_daz()

More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):

To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.

Note that in 64-bit mode floating point computations use SSE instructions, not x87.

edited Nov 24, 2018 at 19:09

answered Nov 24, 2018 at 14:21

Maxim Egorushkin

138k19 gold badges201 silver badges293 bronze badges

Collectives™ on Stack Overflow

Numpy performance differences depending on numerical values

2 Answers 2

Use Intel SVML

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Use Intel SVML

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related