I would like to implement a function which does the following:
Receives a labeled dataset and splits the datapoints according to label
Args:
X (np.ndarray): The dataset
y (np.ndarray): The label for each point in the dataset
Returns:
List[np.ndarray]: A list of arrays where the elements of each array
are datapoints belonging to the label at that index.
Example:
>>> get_clusters(
np.array([[0.8, 0.7], [0, 0.4], [0.3, 0.1]]),
np.array([0,1,0])
)
>>> [array([[0.8, 0.7],[0.3, 0.1]]),
array([[0. , 0.4]])]
I'm currently a bit lost as I don't find any way to write into a certain index of the Numpy Array, so I can only append to the array, instead of append to the array in index 0 where I have the datapoint with label = 0.
Here is my current code:
i = 0
labels = {}
clusters = np.array([
])
for a in y:
if a in labels:
il = labels[a]
clusters = np.append(clusters,X[i])
else:
labels[a] = i
clusters = np.append(clusters,X[i])
i+=1
return clusters
Can anybody help me with implementing the function? Thank you!