0

Question: We want to do cluster analysis of a weather forecast data.

I am clear about the first part:

 #features contains the features on the basis of which we want to make the 
 clusters.
features = ['air_pressure', 'air_temp', 'avg_wind_direction',         
'avg_wind_speed', 'max_wind_direction', 
'max_wind_speed','relative_humidity']

#select_df is the dataframe containing the relevant data for the cluster 
analysis to be carried out.

x = StandardScaler().fit_transform(select_df) 
kmeans_obj = KMeans(n_clusters=12)
model = kmeans_obj.fit(x)

#We find the k-means cluster centers for the model. 
center_model=model.cluster_centers_

#pd is pandas object.
#We are defining a function pd_centers to determine the center of the 
centroids. To the already existing features columns, we are adding an 
additional column named prediction which will contain the cluster number . 
def pd_centers(features, center_model):
    colNames = list(features)
    colNames.append('prediction')

A and index are not defined earlier in the code. Why are they used here. Can anyone explain?

    # Zip with a column called 'prediction' (index). 
    Z = [np.append(A, index) for index, A in enumerate(center_model)]

I cannot understand the below part. Please help. I am new to python(2 weeks old)

   # Convert to pandas data frame for plotting
    p = pd.DataFrame(Z, columns=colNames)
    pd.DataFrame(columns=colNames)
    p['prediction'] = p['prediction'].astype(int)
    return p
2
  • What exactly do you not understand? Because there are still a bunch of things happening. Commented Jul 28, 2019 at 17:02
  • Also, if you have just two weeks experience with Python, you should just read and practice more, to see how everything works. And sometimes ignore things you don't understand until later (provided it doesn't crash or otherwise mess up things). Commented Jul 28, 2019 at 17:03

2 Answers 2

1

In this code you are iterating over center_model with enumeration, which means you are returning each item and it's index as you go through center_model.

# Zip with a column called 'prediction' (index). 
Z = [np.append(A, index) for index, A in enumerate(center_model)]

index, A is the index and value temporarily returned from each item in enumerate(center_model) so that you can use them in np.append(A, index).

The last part of the code is storing the data you just collected in a pandas dataframe. Added comments with update from 0 0

# Convert to pandas data frame for plotting
p = pd.DataFrame(Z, columns=colNames)          # put data from Z into a pandas dataframe
pd.DataFrame(columns=colNames)                 # creates a new, empty DataFrame with those columns, but it's never used
p['prediction'] = p['prediction'].astype(int)  # datatype for 'prediction' filed to int
return p
Sign up to request clarification or add additional context in comments.

2 Comments

pd.DataFrame(columns=colNames): but the column names are already set in the line above. I don't think that line does anything.
I don't think it sets the column names; not to an existing frame at least. For that, one needs to set the .columns attribute or use .rename(). It just creates a new, empty DataFrame with those columns, but it's never used.
1

It creates a Pandas DataFrame which is a data structure commonly used to hold datasets on which you work.

The part which you don't understand creates a DataFrame from data in Z and names the columns from colNames (take a look at the reference to DataFrame to understand what this means). In the row before last it converts the data type in column prediction to int.

2 Comments

But why is there a line pd.DataFrame(columns=colNames)? That doesn't do anything.
Yes, it doesn't do anything.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.