4

edit: this question is not specifically about calculating distances, rather the most efficient way to loop through a numpy array, specifying that for index i all comparisons should be made with the rest of the array, as long as the second index is not i.

I have a numpy array with columns (X, Y, ID) and want to compare each element to each other element, but not itself. So, for each X, Y coordinate, I want to calculate the distance to each other X, Y coordinate, but not itself (where distance = 0).

Here is what I have - there must be a more "numpy" way to write this.

import math, arcpy
# Point feature class
fc = "MY_FEATURE_CLASS"
# Load points to numpy array: (X, Y, ID)
npArray = arcpy.da.FeatureClassToNumPyArray(fc,["SHAPE@X","SHAPE@Y","OID@"])
for row in npArray:
    for row2 in npArray:
        if row[2] != row2[2]:
            # Pythagoras's theorem
            distance = math.sqrt(math.pow((row[0]-row2[0]),2)+math.pow((row[1]-row2[1]),2))

Obviously, I'm a numpy newbie. I will not be surprised to find this a duplicate, but I don't have the numpy vocabulary to search out the answer. Any help appreciated!

7
  • 5
    You may want to look at scipy.spatial.distance.pdist. Commented Mar 18, 2015 at 20:57
  • 2
    possible duplicate of Computing Euclidean distance for numpy in python Commented Mar 18, 2015 at 21:39
  • Note that in many geometries d(x, y) = 0 iff x = y, so you may want to skip that check, and deal with the zeros later on. Disregard this comment if you're dealing computing distance under a non-injective mapping - I just noticed you've discussing "features" in your code, which suggests this may be the case. Commented Mar 18, 2015 at 22:19
  • 1
    Well, if you are willing to use scipy, then the comment by @user2357112 should do the trick. If not, you could use a list comprehension, say distances = [dist(x, y) for x in npArray for y in npArray if x != y], where dist is some metric. Commented Mar 18, 2015 at 22:36
  • 1
    @user2312518 I've been using cKDTree that comes in SciPy to do this kind of task very efficiently, it seems much faster than calculating all the distances Commented Mar 19, 2015 at 0:35

3 Answers 3

0

Using SciPy's pdist, you could write something like

from scipy.spatial.distance import pdist, squareform

distances = squareform(pdist(npArray, lambda a,b: np.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)))

pdist will compute the pair-wise distances using the custom metric that ignores the 3rd coordinate (which is your ID in this case). squareform turns this into a more readable matrix such that distances[0,1] gives the distance between the 0th and 1st rows.

Sign up to request clarification or add additional context in comments.

Comments

0

Each row of X is a 3 dimensional data instance or point. The output pairwisedist[i, j] is distance of X[i, :] and X[j, :]

X = np.array([[6,1,7],[10,9,4],[13,9,3],[10,8,15],[14,4,1]])
a = np.sum(X*X,1)
b = np.repeat( a[:,np.newaxis],5,axis=1)
pairwisedist = b + b.T -2* X.dot(X.T)

Comments

0

I wanted to point out that custom written sqrt of sum of squares are prone to overflow and underflow. Bultin math.hypot, np.hypot are way safer for no compromise on performance

from scipy.spatial.distance import pdist, squareform

distances = squareform(pdist(npArray, lambda a,b: math.hypot(*(a-b))

Refer

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.