K Means Clustering Algorithm Python Explanation needed

Question

Question: We want to do cluster analysis of a weather forecast data.

I am clear about the first part:

 #features contains the features on the basis of which we want to make the 
 clusters.
features = ['air_pressure', 'air_temp', 'avg_wind_direction',         
'avg_wind_speed', 'max_wind_direction', 
'max_wind_speed','relative_humidity']

#select_df is the dataframe containing the relevant data for the cluster 
analysis to be carried out.

x = StandardScaler().fit_transform(select_df) 
kmeans_obj = KMeans(n_clusters=12)
model = kmeans_obj.fit(x)

#We find the k-means cluster centers for the model. 
center_model=model.cluster_centers_

#pd is pandas object.
#We are defining a function pd_centers to determine the center of the 
centroids. To the already existing features columns, we are adding an 
additional column named prediction which will contain the cluster number . 
def pd_centers(features, center_model):
    colNames = list(features)
    colNames.append('prediction')

A and index are not defined earlier in the code. Why are they used here. Can anyone explain?

    # Zip with a column called 'prediction' (index). 
    Z = [np.append(A, index) for index, A in enumerate(center_model)]

I cannot understand the below part. Please help. I am new to python(2 weeks old)

   # Convert to pandas data frame for plotting
    p = pd.DataFrame(Z, columns=colNames)
    pd.DataFrame(columns=colNames)
    p['prediction'] = p['prediction'].astype(int)
    return p

What exactly do you not understand? Because there are still a bunch of things happening. — 9769953
– 9769953, Commented Jul 28, 2019 at 17:02
Also, if you have just two weeks experience with Python, you should just read and practice more, to see how everything works. And sometimes ignore things you don't understand until later (provided it doesn't crash or otherwise mess up things). — 9769953
– 9769953, Commented Jul 28, 2019 at 17:03

score 1 · Accepted Answer · 2019-07-29 02:38:44Z

1

In this code you are iterating over center_model with enumeration, which means you are returning each item and it's index as you go through center_model.

# Zip with a column called 'prediction' (index). 
Z = [np.append(A, index) for index, A in enumerate(center_model)]

index, A is the index and value temporarily returned from each item in enumerate(center_model) so that you can use them in np.append(A, index).

The last part of the code is storing the data you just collected in a pandas dataframe. Added comments with update from 0 0

# Convert to pandas data frame for plotting
p = pd.DataFrame(Z, columns=colNames)          # put data from Z into a pandas dataframe
pd.DataFrame(columns=colNames)                 # creates a new, empty DataFrame with those columns, but it's never used
p['prediction'] = p['prediction'].astype(int)  # datatype for 'prediction' filed to int
return p

edited Jul 29, 2019 at 2:38

answered Jul 28, 2019 at 16:57

user11563547

Sign up to request clarification or add additional context in comments.

2 Comments

9769953 Over a year ago

pd.DataFrame(columns=colNames): but the column names are already set in the line above. I don't think that line does anything.

9769953 Over a year ago

I don't think it sets the column names; not to an existing frame at least. For that, one needs to set the .columns attribute or use .rename(). It just creates a new, empty DataFrame with those columns, but it's never used.

AndrejH · Accepted Answer · 2019-07-28 16:57:04Z

1

It creates a Pandas DataFrame which is a data structure commonly used to hold datasets on which you work.

The part which you don't understand creates a DataFrame from data in Z and names the columns from colNames (take a look at the reference to DataFrame to understand what this means). In the row before last it converts the data type in column prediction to int.

answered Jul 28, 2019 at 16:57

AndrejH

2,1492 gold badges13 silver badges25 bronze badges

2 Comments

9769953 Over a year ago

But why is there a line pd.DataFrame(columns=colNames)? That doesn't do anything.

AndrejH Over a year ago

Yes, it doesn't do anything.

Collectives™ on Stack Overflow

K Means Clustering Algorithm Python Explanation needed

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related