9

Consider the following image, stored as a numpy array:

a = [[0,0,0,0,0,1,1,0,0,0],
     [0,0,0,0,1,1,1,1,0,0],
     [0,0,0,0,0,1,1,0,0,0],
     [0,0,0,0,0,0,0,0,0,0],
     [0,0,0,0,0,2,0,0,0,0],
     [0,0,0,0,0,2,2,0,0,0],
     [0,0,0,0,0,2,0,0,0,0],
     [0,0,0,0,3,3,3,0,0,0],
     [4,0,0,0,0,0,0,0,0,0],
     [4,4,0,0,0,0,0,0,0,0],
     [4,4,4,0,0,0,0,0,0,0]]

a = np.array(a)

Zeros represent background pixels, 1,2,3 and 4 represent pixels that belong to objects. You can see that objects always form contiguous islands or regions in the image. I would like to know the distance between every pair of objects. As distance measure I'd like to have the shortest, staightline distance, between those pixels of the object, that are closest to each other. Example: Distance(2,3) = 1, because they are touching. Distance(1,2) = 2, because there is exactly one background pixel separating the two regions, or in other words, the closest pixels of the objects are two pixels apart.

Can anybody tell me how one would approach this problem in Python? Or link me to some resources?

6
  • Does this answer your question? Fastest pairwise distance metric in python Commented Jul 9, 2020 at 8:46
  • 1
    No unfortunately not. The link you posted treats the 1D case, and a slightly different problem altogether. Commented Jul 9, 2020 at 8:49
  • I think the solution to your problem is fundamentally the same. I'd start trying adapting what's in there and see if you get stuck somewhere. Commented Jul 9, 2020 at 8:49
  • 1
    I think if the islands are already "labeled" the bellow appraoch should work just fine. The connectec components are more if they werent necessarily labeled Commented Jul 9, 2020 at 9:05
  • 1
    @r0f1 What is the straight-line between island 3 and 4? I am not sure if the distance definition is clear for the cases like that. Please elaborate on it. Thank you. Commented Jul 9, 2020 at 9:06

2 Answers 2

8

This is what you would need:

from scipy.spatial.distance import cdist
def Distance(a, m, n):
  return cdist(np.argwhere(a==m),np.argwhere(a==n),'minkowski',p=1.).min()

or similarly per @MaxPowers comment (claim: cityblock is faster):

  return cdist(np.argwhere(a==m),np.argwhere(a==n),'cityblock').min()

Find the locations of islands and calculate pairwise distance of locations and get the minimum. I am not 100% sure of your desired distance, but I think you are looking for l1 norm. If not, you can change the cdist measure to your desired metric.

output:

Distance(a,2,3)
1.0
Distance(a,2,1)
2.0
Distance(a,3,1)
5.0
Distance(a,4,3)
5.0
Sign up to request clarification or add additional context in comments.

6 Comments

This answers the question. You could directly use metric='cityblock', which is way faster and does not need the p parameter.
@MaxPowers Thank you. Didn't know it has its own metric. Could you please elaborate on why that one is faster in performance? Shouldn't they be almost similar?
The L1 norm is calculated as the sum of the differences of the vector elements. Minkowski is calculated as the generalized Lp norm using np.norm. Hence, the ladder takes the powers of each vector element and computes the root of their total difference.
@MaxPowers Interesting. Thank you. I was expecting it to be smarter for the case of p=1
FYI, @Ehsan: as per pull request 12375 minkowski will be smarter for the cases p=1 and p=2. This will be availabel in release 1.16.0.
|
6

For many blobs or bigger blobs or if performance/memory efficiency is a criteria, you might want to work with contours of those islands. With that in mind, we will use OpenCV's findContours to get the contours, then perform pairwise distance computation and get the min one as the final output. The implementation would look something like this that gets all possible pairiwise distances -

from scipy.spatial.distance import cdist
import cv2

ids = np.arange(1, a.max()+1) #np.unique(a)[1:] if not in ranged sequence

idxs = []
for id_ in ids:
    im = (a == id_).astype(np.uint8)
    contours,_ = cv2.findContours(im, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
    idx = contours[0][:, 0]
    idxs.append(idx)

# Get pairwise indices and then distances
r,c = np.triu_indices(len(ids), 1)
pdists = {(ids[i],ids[j]):cdist(idxs[i], idxs[j]).min() for (i, j) in zip(r, c)}

Output dict for given sample -

In [225]: pdists
Out[225]: 
{(1, 2): 2.0,
 (1, 3): 5.0,
 (1, 4): 7.810249675906654,
 (2, 3): 1.0,
 (2, 4): 5.0,
 (3, 4): 3.605551275463989}

By default,cdist uses euclidean distance as the metric. Depending on your definition of straight line between islands, you might want to try out other metrics, namely 'minkowski' and 'cityblock' for Minkowski and Manhattan distances respectively.

So, cdist(idxs[i], idxs[j]) would change to cdist(idxs[i], idxs[j], metric=...).

1 Comment

note: this assumes that each island (connected component) has a unique value. If multiple connected components have the same value, idx = [c[0] for contour in contours for c in contour] would be more accurate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.