3

I have coded a kriging algorithm but I find it quite slow. Especially, do you have an idea on how I could vectorise the piece of code in the cons function below:

import time
import numpy as np

B = np.zeros((200, 6))
P = np.zeros((len(B), len(B)))

def cons():
  time1=time.time()
  for i in range(len(B)):
    for j in range(len(B)):
      P[i,j] = corr(B[i], B[j])
  time2=time.time()
  return time2-time1

def corr(x,x_i):
  return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))))    

time_av = 0.
for i in range(30):
  time_av+=cons()
print "Average=", time_av/100.

Edit: Bonus questions

  1. What happens to the broadcasting solution if I want corr(B[i], C[j]) with C the same dimension than B
  2. What happens to the scipy solution if my p-norm orders are an array:

    p=np.array([1.,2.,1.,2.,1.,2.])
    def corr(x, x_i):
      return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))**p))  
    

    For 2., I tried P = np.exp(-cdist(B, C,'minkowski', p)) but scipy is expecting a scalar.

2
  • 1
    @TobySpeight while CR only accepts working code, I don't think all performance-improvement questions are off-topic here. This here says so too. Commented Oct 10, 2016 at 10:51
  • Don't add details that would require the posted solution(s) to be modified significantly. Instead, post a new question with those new details and if reqd. linking to this question. Commented Oct 10, 2016 at 21:10

1 Answer 1

5

Your problem seems very simple to vectorize. For each pair of rows of B you want to compute

P[i,j] = np.exp(-np.sum(np.abs(B[i,:] - B[j,:])))

You can make use of array broadcasting and introduce a third dimension, summing along the last one:

P2 = np.exp(-np.sum(np.abs(B[:,None,:] - B),axis=-1))

The idea is to reshape the first occurence of B to shape (N,1,M) while the second B is left with shape (N,M). With array broadcasting, the latter is equivalent to (1,N,M), so

B[:,None,:] - B

is of shape (N,N,M). Summing along the last index will then result in the (N,N)-shape correlation array you're looking for.


Note that if you were using scipy, you would be able to do this using scipy.spatial.distance.cdist (or, equivalently, a combination of scipy.spatial.distance.pdist and scipy.spatial.distance.squareform), without unnecessarily computing the lower triangular half of this symmetrix matrix. Using @Divakar's suggestion in comments for the simplest solution this way:

from scipy.spatial.distance import cdist
P3 = 1/np.exp(cdist(B, B, 'minkowski',1))

cdist will compute the Minkowski distance in 1-norm, which is exactly the sum of the absolute values of coordinate differences.

Sign up to request clarification or add additional context in comments.

6 Comments

Thus, 1/np.exp(pdist(B, 'minkowski',1)) or 1/np.exp(cdist(B, B, 'minkowski',1)) could be added there I guess.
@Divakar yeah, I didn't want to be too explicit because OP doesn't seem to use scipy, so an additional import might be overkill. But I'll add it anyway:) Also, thanks for cdist, I wasn't aware of its existence.
Very complete answer, thanks.@AndrasDeak Indeed, I would like to avoid using scipy but the cdist function looks very insteresting. You point out the fact that the lower triangular matrix is uselessly computed, and in my code I actually have the second for loop starting from i+1: for j in range(i+1, len(B)), any chance to achieve that with your first solution?
@Jean not with array broadcasting, I don't think so. If depending on scipy is not an issue, I suggest that you time both the redundant numpy version and the scipy one; you might find that even with the overhead of importing once scipy will be the fastest.
So for the timing I get, with an average over 100 tries: my solution: 0.339s, array broadcasting: 0.00393s and scipy (without import time): 0.00333s.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.