23

I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba-function to my Pandas data frame containing the reviews. I tried doing something like:

test_data['prediction'] = sentiment_model.predict_proba(test_matrix)

Obviously, that doesn't work, since predict_proba returns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrix with SciKit-Learn's CountVectorizer:

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))

Sample data looks like:

| Review                                     | Prediction         |                      
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"|   0.986            |
5
  • Could you provide a sample data set (5 - 7 rows)? Commented Feb 18, 2017 at 11:34
  • Related question: stackoverflow.com/questions/41904197/… Commented Feb 18, 2017 at 11:36
  • 1
    Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions, x = sentiment_model.predict_proba(test_matrix) then you can do, test_data['prediction0'] = x[:,0] and test_data['prediction1'] = x[:,1] Commented Feb 18, 2017 at 11:46
  • @KarthikArumugham thanks so much. It worked like a charm! I need to sharpen up on slicing and dicing data ;) Commented Feb 18, 2017 at 12:08
  • @DBE7 I've shared it as an answer. Pls mark it correct. Commented Feb 18, 2017 at 12:53

2 Answers 2

26

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions,

x = sentiment_model.predict_proba(test_matrix)

then you can do,

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]
Sign up to request clarification or add additional context in comments.

1 Comment

was very helpful
3
import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.arange(10).reshape(5, 2), columns=['a', 'b'])
print('df:', df, sep='\n')

arr = np.arange(100, 104).reshape(2, 2)
print('array to append:', arr, sep='\n')

df = df.append(pd.DataFrame(arr, columns=df.columns), ignore_index=True)
print('df:', df, sep='\n')

output

df:
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
array to append:
[[100 101]
 [102 103]]
df:
     a    b
0    0    1
1    2    3
2    4    5
3    6    7
4    8    9
5  100  101
6  102  103

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.