python - how to append numpy array to a pandas dataframe

Question

I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba-function to my Pandas data frame containing the reviews. I tried doing something like:

test_data['prediction'] = sentiment_model.predict_proba(test_matrix)

Obviously, that doesn't work, since predict_proba returns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrix with SciKit-Learn's CountVectorizer:

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))

Sample data looks like:

| Review                                     | Prediction         |                      
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"|   0.986            |

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions, x = sentiment_model.predict_proba(test_matrix) then you can do, test_data['prediction0'] = x[:,0] and test_data['prediction1'] = x[:,1] — Karthik Arumugham
– Karthik Arumugham, Commented Feb 18, 2017 at 11:46
@KarthikArumugham thanks so much. It worked like a charm! I need to sharpen up on slicing and dicing data ;) — DBE7
– DBE7, Commented Feb 18, 2017 at 12:08

Karthik Arumugham · Accepted Answer · 2017-02-18 12:50:41Z

26

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions,

x = sentiment_model.predict_proba(test_matrix)

then you can do,

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]

answered Feb 18, 2017 at 12:50

Karthik Arumugham

1,3601 gold badge13 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

suku Over a year ago

was very helpful

Markus Dutschke · Accepted Answer · 2021-04-08 14:57:14Z

3

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.arange(10).reshape(5, 2), columns=['a', 'b'])
print('df:', df, sep='\n')

arr = np.arange(100, 104).reshape(2, 2)
print('array to append:', arr, sep='\n')

df = df.append(pd.DataFrame(arr, columns=df.columns), ignore_index=True)
print('df:', df, sep='\n')

output

df:
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
array to append:
[[100 101]
 [102 103]]
df:
     a    b
0    0    1
1    2    3
2    4    5
3    6    7
4    8    9
5  100  101
6  102  103

answered Apr 8, 2021 at 14:57

Markus Dutschke

10.8k5 gold badges73 silver badges67 bronze badges

Collectives™ on Stack Overflow

python - how to append numpy array to a pandas dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related