0

I am new to programming and hoping somebody can help me with a specific problem I have.

I want to form clusters in a 100x100 binary numpy ndarray under two conditions:

  1. I want to specify the number of pixels that have value zero and one.
  2. I want to have an input variable that allows me to form larger or smaller clusters.

With the answers on this page I made an ndarray with 300 zeros and 700 ones.

> import numpy as np

> N=1000 
> K=300 

> arr=[0] * K + [1] * (N-K)
> np.random.shuffle(arr)
> arr1=np.resize(arr,(100,100))

I then would like to implement a clustering algorithm that allows me to specify some measure of cluster density or cluster size.

I looked into the scipy.ndimage package but can't seem to find anything useful.

EDIT: To make my question more clear, previously I was using the package nlmpy, which uses numpy to make arrays representing virtual landscapes.

It does this by generating an random array with continues values between [0-1], and using '4-neighbourhood' classification on a subset of pixels. After the clustering of pixels, it uses an interpolate function to assign the remainder of the pixels to one of the clusters.

For example, making clusters with 60% of the pixels:

import nlmpy
nRow=100
nCol=100
arr=nlmpy.randomClusterNN(nRow, nCol, 0.60, n='4-neighbourhood', mask=None)

This gives clusters with values ranging from [0-1]:

Clustered array

I then use a built in function of nlmpy to reclassify this output into a binary ndarray. For example 50% of pixels need to have value '1' and 50% value '0'.

arrBinair= nlmpy.classifyArray(arr, [0.50, 0.50])

Output:

Binary clustered array

The problem here is that not exactly 50% of the values are '1' or '0' .

print(arrBinair==1).sum()
output: 3023.0

This is because of the nlmpy.randomClusterNN function that first makes different clusters and only then a binary reclassification of the clusters is done.

My question is if a binary clustering landscape can be generated in a faster way, without first clustering in continuous classes and without using the nlmpy package ?

I hope this is enough information ? Or do I need to post the functions 'under the hood' of the nlmpy package ? I hesitate as it is quite a lot of code.

Many thanks.

0

1 Answer 1

0

You can more-or-less get what you want using sklearn.cluster.DBSCAN:

from matplotlib import pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN

def randones(shape, n, dtype=None):
    arr = np.zeros(shape, dtype=dtype)
    arr.flat[np.random.choice(arr.size, size=n, replace=False)] = 1
    return arr

def cluster(arr, *args, **kwargs):
    data = np.array(arr.nonzero()).T
    c = DBSCAN(*args, **kwargs)
    c.fit(data)
    return data, c

# generate random data
shape = (100, 100)
n = 300
arr = randones(shape, n)

# perform clustering
data, c = cluster(arr, eps=6, min_samples=4)

# plot the clusters in different colors
colors = [('C%d' % (i%10)) if i > -1 else 'k' for i in c.labels_]
fig = plt.figure(figsize=(8,8))
ax = fig.gca()
ax.scatter(*data.T, c=colors)

Output:

enter image description here

The minimum number of points in a cluster is defined by the min_samples parameter. You can adjust the minimum density of the identified clusters by twiddling the eps parameter (which defines the maximum distance between two points in a cluster). For example, you can identify larger, less dense clusters by increasing eps:

# perform clustering
data, c = cluster(arr, eps=8, min_samples=4)

If we plot this less-dense clustering in the same way as before, it gives:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.