I am playing around with Numba to see how much faster I can make a Python+NumPy code. My test function computes the pairwise Euclidean distances of n points in a three-dimensional space. I am getting 2 orders of magnitude speedup with Numba. If I comment out the lines where I store the distances in an array (i.e. distance[i, j] = d and distance[j, i] = d), I get 6 orders of magnitude speedup with Numba. So basically, the computations are lightning fast but accessing the array which holds the results is slow. Is there a way to speedup array access?
NumPy and Numba functions
import numpy as np
from numba import jit, float64, void
def pairwise_distance_numpy(distance, point):
numPoints = point.shape[0]
for i in range(numPoints):
for j in range(0, i):
d = 0.0
for k in range(3):
tmp = point[i, k] - point[j, k]
d += tmp*tmp
d = d**0.5
distance[i, j] = d
distance[j, i] = d
pairwise_distance_numba = jit(void(float64[:,:], float64[:,:]), nopython=True)(pairwise_distance_numpy)
Benchmark script
import numpy as np
from time import time
from pairwise_distance import pairwise_distance_numpy as pd_numpy
from pairwise_distance import pairwise_distance_numba as pd_numba
n = 1000
point = np.random.rand(n, 3)
distance = np.empty([n, n], dtype=np.float64)
pd_numpy(distance, point)
t = time()
pd_numpy(distance, point)
dt_numpy = time() - t
print('Numpy elapsed time: ', dt_numpy)
pd_numba(distance, point)
t = time()
pd_numba(distance, point)
dt_numba = time() - t
print('Numba Elapsed time: ', dt_numba)
print('Numba speedup: ', dt_numpy/dt_numba)
pdistoperation there:squareform(pdist(point[:,:3])). Have you tried that yet?