Question: We want to do cluster analysis of a weather forecast data.
I am clear about the first part:
#features contains the features on the basis of which we want to make the
clusters.
features = ['air_pressure', 'air_temp', 'avg_wind_direction',
'avg_wind_speed', 'max_wind_direction',
'max_wind_speed','relative_humidity']
#select_df is the dataframe containing the relevant data for the cluster
analysis to be carried out.
x = StandardScaler().fit_transform(select_df)
kmeans_obj = KMeans(n_clusters=12)
model = kmeans_obj.fit(x)
#We find the k-means cluster centers for the model.
center_model=model.cluster_centers_
#pd is pandas object.
#We are defining a function pd_centers to determine the center of the
centroids. To the already existing features columns, we are adding an
additional column named prediction which will contain the cluster number .
def pd_centers(features, center_model):
colNames = list(features)
colNames.append('prediction')
A and index are not defined earlier in the code. Why are they used here. Can anyone explain?
# Zip with a column called 'prediction' (index).
Z = [np.append(A, index) for index, A in enumerate(center_model)]
I cannot understand the below part. Please help. I am new to python(2 weeks old)
# Convert to pandas data frame for plotting
p = pd.DataFrame(Z, columns=colNames)
pd.DataFrame(columns=colNames)
p['prediction'] = p['prediction'].astype(int)
return p