1

I have a numpy array which contains data similar to the following:

01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000

So there are clusters/areas of non-zero values. I need to build a list of these clusters (list of lists where each item is a tuple of array coordinates). One cluster/area consists of the same digits (e.g. "2"s only or "3"s only).

For example, a constant representation of the bottom cluster/area whould be:

[(14, 9), (15, 9), (16, 9), (14, 10), (15, 10), (16, 10)]

I created a recursive method for it but it is slow and I have some issue with stack overflow errors.

Is there an easy non-error-prone way to implement it more efficiently in Python? Ideally with some library/matrix operations.

(Actually, the array is an image and clusters/areas are masks for a computer vision task.)

1 Answer 1

1

You can use skimage.measure which has functions to label these "clusters", which can be described as connected components, and obtain the coordinates using skimage.measure.regionprops's coords method:

from skimage.measure import label, regionprops

l = label(a, connectivity=1)
clusters = [i.coords for i in regionprops(l)]

Different numbers will imply different regions. But to limit neighbouring points to a single orthogonal position, we must set connectivity=1 in skimage.measure.label, otherwise the two last clusters would be considered the same, as both have 2s.

For instance, the last component you also shared would be:

clusters[-1]
array([[ 9, 14],
       [ 9, 15],
       [ 9, 16],
       [10, 14],
       [10, 15],
       [10, 16]], dtype=int64)

Numpy array construction:

from io import StringIO
import numpy as np

s = StringIO("""
01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000""")
a = np.genfromtxt(s, delimiter=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.