3

I have dataframe :

 sepallength sepalwidth petallength petalwidth        class   cluster
0         5.1        3.5         1.4        0.2  Iris-setosa  cluster1
1         4.9          3         1.4        0.2  Iris-setosa  cluster1
2         4.7        3.2         1.3        0.2  Iris-setosa  cluster1
3         4.6        3.1         1.5        0.2  Iris-setosa  cluster1
4           5        3.6         1.4        0.2  Iris-setosa  cluster1
5         5.4        3.9         1.7        0.4  Iris-setosa  cluster1
6         4.6        3.4         1.4        0.3  Iris-setosa  cluster1
7           5        3.4         1.5        0.2  Iris-setosa  cluster1
8         4.4        2.9         1.4        0.2  Iris-setosa  cluster1
9         4.9        3.1         1.5        0.1  Iris-setosa  cluster1

and a dictionary :

{'cluster2': 'Iris-virginica', 'cluster0': 'Iris-versicolor', 'cluster1': 'Iris-setosa'}

I need to add another column and fill it with values from this dictionary of df['cluster'] == key

I have tried using np.where

def countTruth(df):
    # dictionary mapping cluster to most frequent class

    clustersClass = df.groupby(['cluster'])['class'].agg(lambda x:x.value_counts().index[0]).to_dict()
    for eachKey in clustersClass:
        newv = clustersClass[eachKey]
        print df
        df['new'] = np.where(df['cluster']==eachKey , newv) 

crashes saying either both or neither of x and y should be given

my ultimate goal is to count true positive , true negatives , FP and FN , based on cluster and class label. this is the step towards..

1 Answer 1

2

Call map and pass the dict:

In [326]:

d={'cluster2': 'Iris-virginica', 'cluster0': 'Iris-versicolor', 'cluster1': 'Iris-setosa'}
df['key'] = df['cluster'].map(d)
df
Out[326]:
   sepallength  sepalwidth  petallength  petalwidth        class   cluster  \
0          5.1         3.5          1.4         0.2  Iris-setosa  cluster1   
1          4.9         3.0          1.4         0.2  Iris-setosa  cluster1   
2          4.7         3.2          1.3         0.2  Iris-setosa  cluster1   
3          4.6         3.1          1.5         0.2  Iris-setosa  cluster1   
4          5.0         3.6          1.4         0.2  Iris-setosa  cluster1   
5          5.4         3.9          1.7         0.4  Iris-setosa  cluster1   
6          4.6         3.4          1.4         0.3  Iris-setosa  cluster1   
7          5.0         3.4          1.5         0.2  Iris-setosa  cluster1   
8          4.4         2.9          1.4         0.2  Iris-setosa  cluster1   
9          4.9         3.1          1.5         0.1  Iris-setosa  cluster1   

           key  
0  Iris-setosa  
1  Iris-setosa  
2  Iris-setosa  
3  Iris-setosa  
4  Iris-setosa  
5  Iris-setosa  
6  Iris-setosa  
7  Iris-setosa  
8  Iris-setosa  
9  Iris-setosa  
Sign up to request clarification or add additional context in comments.

2 Comments

Great @EdChum you always have the answer and you understand the questions :+1 . I usually look into documentation of the pandas , but am not able to pick up exactly what i need and land up taking rather a convoluted approach. Is there any book/ reference which would help me train myself in pandas data manipulation.
There is wes's book and also the online cookbook, to be honest like with everything the thing is to try. I don't use pandas professionally but I try to answer questions as best I can and have learned from the other pandas users and devs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.