I am new to programming and hoping somebody can help me with a specific problem I have.
I want to form clusters in a 100x100 binary numpy ndarray under two conditions:
- I want to specify the number of pixels that have value zero and one.
- I want to have an input variable that allows me to form larger or smaller clusters.
With the answers on this page I made an ndarray with 300 zeros and 700 ones.
> import numpy as np
> N=1000
> K=300
> arr=[0] * K + [1] * (N-K)
> np.random.shuffle(arr)
> arr1=np.resize(arr,(100,100))
I then would like to implement a clustering algorithm that allows me to specify some measure of cluster density or cluster size.
I looked into the scipy.ndimage package but can't seem to find anything useful.
EDIT: To make my question more clear, previously I was using the package nlmpy, which uses numpy to make arrays representing virtual landscapes.
It does this by generating an random array with continues values between [0-1], and using '4-neighbourhood' classification on a subset of pixels. After the clustering of pixels, it uses an interpolate function to assign the remainder of the pixels to one of the clusters.
For example, making clusters with 60% of the pixels:
import nlmpy
nRow=100
nCol=100
arr=nlmpy.randomClusterNN(nRow, nCol, 0.60, n='4-neighbourhood', mask=None)
This gives clusters with values ranging from [0-1]:
I then use a built in function of nlmpy to reclassify this output into a binary ndarray. For example 50% of pixels need to have value '1' and 50% value '0'.
arrBinair= nlmpy.classifyArray(arr, [0.50, 0.50])
Output:
The problem here is that not exactly 50% of the values are '1' or '0' .
print(arrBinair==1).sum()
output: 3023.0
This is because of the nlmpy.randomClusterNN function that first makes different clusters and only then a binary reclassification of the clusters is done.
My question is if a binary clustering landscape can be generated in a faster way, without first clustering in continuous classes and without using the nlmpy package ?
I hope this is enough information ? Or do I need to post the functions 'under the hood' of the nlmpy package ? I hesitate as it is quite a lot of code.
Many thanks.



