1

I am playing around with Numba to see how much faster I can make a Python+NumPy code. My test function computes the pairwise Euclidean distances of n points in a three-dimensional space. I am getting 2 orders of magnitude speedup with Numba. If I comment out the lines where I store the distances in an array (i.e. distance[i, j] = d and distance[j, i] = d), I get 6 orders of magnitude speedup with Numba. So basically, the computations are lightning fast but accessing the array which holds the results is slow. Is there a way to speedup array access?

NumPy and Numba functions

import numpy as np
from numba import jit, float64, void

def pairwise_distance_numpy(distance, point):
    numPoints = point.shape[0]
    for i in range(numPoints):
        for j in range(0, i):

            d = 0.0
            for k in range(3):
                tmp = point[i, k] - point[j, k]
                d += tmp*tmp
            d = d**0.5

            distance[i, j] = d
            distance[j, i] = d

pairwise_distance_numba = jit(void(float64[:,:], float64[:,:]), nopython=True)(pairwise_distance_numpy)

Benchmark script

import numpy as np
from time import time
from pairwise_distance import pairwise_distance_numpy as pd_numpy
from pairwise_distance import pairwise_distance_numba as pd_numba

n = 1000
point = np.random.rand(n, 3)
distance = np.empty([n, n], dtype=np.float64)

pd_numpy(distance, point)
t = time()
pd_numpy(distance, point)
dt_numpy = time() - t
print('Numpy elapsed time: ', dt_numpy)

pd_numba(distance, point)
t = time()
pd_numba(distance, point)
dt_numba = time() - t
print('Numba Elapsed time: ', dt_numba)

print('Numba speedup: ', dt_numpy/dt_numba)
5
  • 1
    Are you sure that the calculations are actually done? Maybe because those values are not stored anyway Numba just optimized them away! (you could try to do some other operation with those values that is cheaper than putting them in an array) Commented Jul 9, 2015 at 10:13
  • 1
    It appears you are right. I timed the computation bit and the storage bit and the the computations take 5-7x more time than the storage. So indeed Numba appears to be quite smart :) Commented Jul 9, 2015 at 10:30
  • You are basically performing pdist operation there: squareform(pdist(point[:,:3])). Have you tried that yet? Commented Jul 9, 2015 at 10:36
  • Yes I am aware. The specific function however is not important for me. I am just playing around with Numba to experience its performance. Commented Jul 9, 2015 at 10:39
  • @cfbaptista: I'm going to write that as an answer, in case other people find it useful! Commented Jul 11, 2015 at 0:09

1 Answer 1

2

It seems Numba just optimized the calculations away since you're not storing the result in a variable. (from your code + your comment confirming this) Array access in numpy should be pretty pretty fast in most cases!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.