I am trying to use numpy to dynamically create a set of zeros based on the size of a separate numpy array.
This is a small portion of the code of a much larger project. I have posted everything relevant in this question. I have a function k means which takes in a dataset (posted below) and a k value (which is 3, for this example).
I create a variable centroids which is supposed to look something like
[[4.9 3.1 1.5 0.1]
[7.2 3. 5.8 1.6]
[7.2 3.6 6.1 2.5]]
From there, I need to create a numpy array of "labels", one corresponding to every row in the dataset, of all zeroes with the same shape as the centroids array. Meaning, for a dataset with 5 rows, it would look like:
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
This is what I am trying to achieve, albiet on a dynamic scale (i.e. where the # of rows and columns in the dataset are unknown).
The following (hard coded, non numpy) satisfies that (assuming there are 150 lines in the dataset:
def k_means(dataset, k):
centroids = [[5,3,2,4.5],[5,3,2,5],[2,2,2,2]]
cluster_labels = []
for i in range(0,150):
cluster_labels.append([0,0,0,0])
print (cluster_labels)
I am trying to do this dynamically with the following:
def k_means(dataset, k):
centroids = dataset[numpy.random.choice(dataset.shape[0], k, replace=False), :]
print(centroids)
cluster_labels = []
cluster_labels = numpy.asarray(cluster_labels)
for index in range(len(dataset)):
# temp_array = numpy.zeros_like(centroids)
# print(temp_array)
cluster_labels = cluster_labels.append(cluster_labels, numpy.zeros_like(centroids))
The current result is: AttributeError: 'numpy.ndarray' object has no attribute 'append'
Or, if I comment out the cluster_labels line and uncomment the temp, I get:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
I will ultimately get 150 sets of that.
Sample of Iris Dataset:
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
4.8 3.4 1.6 0.2
4.8 3 1.4 0.1
4.3 3 1.1 0.1
5.8 4 1.2 0.2
5.7 4.4 1.5 0.4
5.4 3.9 1.3 0.4
5.1 3.5 1.4 0.3
5.7 3.8 1.7 0.3
5.1 3.8 1.5 0.3
5.4 3.4 1.7 0.2
5.1 3.7 1.5 0.4
4.6 3.6 1 0.2
5.1 3.3 1.7 0.5
4.8 3.4 1.9 0.2
5 3 1.6 0.2
5 3.4 1.6 0.4
5.2 3.5 1.5 0.2
5.2 3.4 1.4 0.2
4.7 3.2 1.6 0.2
4.8 3.1 1.6 0.2
5.4 3.4 1.5 0.4
5.2 4.1 1.5 0.1
5.5 4.2 1.4 0.2
Can anybody help me dynamically use numpy to achieve what I am aiming for?
Thanks.
appendmethod, a numpy array (ndarray) does not. It's best to collect values in a list (the originalcluster_labels) with list append, and create an array from that afterwards (outside the loop).[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]. Shouldn't it be (witharr = np.zeros_like(centroids)[arr, arr, arr, arr, arr], a 3D array?