I have dataframe :
sepallength sepalwidth petallength petalwidth class cluster
0 5.1 3.5 1.4 0.2 Iris-setosa cluster1
1 4.9 3 1.4 0.2 Iris-setosa cluster1
2 4.7 3.2 1.3 0.2 Iris-setosa cluster1
3 4.6 3.1 1.5 0.2 Iris-setosa cluster1
4 5 3.6 1.4 0.2 Iris-setosa cluster1
5 5.4 3.9 1.7 0.4 Iris-setosa cluster1
6 4.6 3.4 1.4 0.3 Iris-setosa cluster1
7 5 3.4 1.5 0.2 Iris-setosa cluster1
8 4.4 2.9 1.4 0.2 Iris-setosa cluster1
9 4.9 3.1 1.5 0.1 Iris-setosa cluster1
and a dictionary :
{'cluster2': 'Iris-virginica', 'cluster0': 'Iris-versicolor', 'cluster1': 'Iris-setosa'}
I need to add another column and fill it with values from this dictionary of df['cluster'] == key
I have tried using np.where
def countTruth(df):
# dictionary mapping cluster to most frequent class
clustersClass = df.groupby(['cluster'])['class'].agg(lambda x:x.value_counts().index[0]).to_dict()
for eachKey in clustersClass:
newv = clustersClass[eachKey]
print df
df['new'] = np.where(df['cluster']==eachKey , newv)
crashes saying either both or neither of x and y should be given
my ultimate goal is to count true positive , true negatives , FP and FN , based on cluster and class label. this is the step towards..