Consider the following iPython perf test, where we create a pair of 10,000 long 32-bit vectors and add them. Firstly using integer arithmetic and then using float arithmetic:
from numpy.random import randint
from numpy import int32, float32
a, b = randint(255,size=10000).astype(int32), randint(255,size=10000).astype(int32)
%timeit a+b # int32 addition, gives 20.6µs per loop
a, b = randint(255,size=10000).astype(float32), randint(255,size=10000).astype(float32)
%timeit a+b # float32 addition, gives 3.91µs per loop
Why is the floating point version about 5x faster?
If you do the same test with float64 it takes twice as long as float32, which is what you'd expect if we are fully utilizing hardware. However the timing for the integer case seems to be constant for int8 to int64. This, together with the 5x slowdown make me suspect that it is completely failing to use SSE.
For int32, I observe similar 20µs values when a+b is replaced by a & 0xff or a >> 2, suggesting that the problem is not limited to addition.
I'm using numpy 1.9.1, though unfortunately I can't remember whether I complied it locally or downloaded a binary. But either way, this performance observation was pretty shocking to me. How is it possible that the version I have is so hopeless at integer arithmetic?
Edit: I've also tested on a similar, but separate PC, running numpy 1.8, which I'm fairly sure was straight from a PythonXY binary. I got the same results.
Question: Do other people see similar results, if not what can I do to be like them?
Update: I have created a new issue on numpy's github repo.
np.show_config()andimport numpy.distutils.system_info as sysinfo; sysinfo.show_all(). See this SO question.sysinfo.show_all()- see output on pastebin - it's mostly a list of things with "NOT AVAILABLE".numpy/numpy/core/src/umath/simd.inc.src. I don't claim to have any idea of what it's doing, but I'm surprised that there aren't opportunities for the compiler to auto vectorise loops, once all the broadcasting/stepping logic has been dealt with by the programmer. Is there an existing issue/milestone that I can show interest in. If not would you consider creating one...I can't see myself hacking at the core of numpy any time soon.