I have an ordered 1-D array of numbers. Both the array length and the values of the numbers in the array are arbitrary. I want to partition the array into k partitions, according to the number values, e.g. let's say I want 4 partitions, distributed as 30% / 30% / 20% / 20%, i.e. the top 30% values first, the next 30% afterwards, etc. I get to choose k and the percentages of the distribution. In addition, if the same number appears more than once in the array, it should not be contained in two different partitions. This means that the distribution percentages above are not strict, but rather the "goals" or "starting points" if you wish.
For example, let's say my array is ar = [1, 5, 5, 6, 7, 8, 8, 8, 8, 8].
I choose k = 4 and the numbers should be distributed into partitions A, B, C and D with percentages pA = pB = pC = pD = 25%.
Given the constraints I gave above, the resulting partitions should be:
A = [1]
B = [5, 5]
C = [6, 7]
D = [8, 8, 8, 8, 8]
with resulting (achieved/corrected) percentages pcA = 10%, pcB = 20%, pcC = 20%, pcD = 50%
It seems to me that I need a modified k-means algorithm, because the standard algorithm is not guaranteed to respect my percentages and/or the requirement that the same value cannot be in more than one cluster/partition.
So, is there an algorithm for this kind of clustering?
[ 1, 1, 1, 1, 1, 1, 1, 8]?ar=[1,2,3,4,5,6,7,8,9,10]?A=[1,2], B=[3,4,5], C=[6,7], D=[8,9,10], orA=[1,2,3], B=[4,5], C=[6,7,8], D=[9,10], but both are acceptable. It depends on how you do your rounding on the division.