I am trying to build a classifier with sklearn and get the following error in my console when I run my code.
ValueError: Boolean array expected for the condition, not object
I tried tweaking my data (filling in null values) as well as playing with reshaping properties (however to no avail).
Here is the relevant code
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.externals import joblib
# Get the dataset
dataset = pd.read_csv('master_info_final_v12_conversion.csv')
# Split the dataset into features and labels
X = dataset[dataset[['Happy', 'Stress', 'Eyes']]]
y = dataset[dataset['phenotype']]
# Split the dataset into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Build the classifier and make prediction
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)
# Print the confusion matrix
print(confusion_matrix(y_test, prediction))
# Save the model to disk
joblib.dump(classifier, 'classifier.joblib')
Here is a snapshot of my data:
| name | rating | phenotype | Happy | Stress | Eyes |
|---|---|---|---|---|---|
| tommy | 7.1 | boy | 56 | 23 | 19 |
| jill | 2.3 | girl | 74 | 57 | |
| carlos | 4.4 | neither | 45 |