3

I am trying to build a classifier with sklearn and get the following error in my console when I run my code.

ValueError: Boolean array expected for the condition, not object

I tried tweaking my data (filling in null values) as well as playing with reshaping properties (however to no avail).

Here is the relevant code

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.externals import joblib

# Get the dataset
dataset = pd.read_csv('master_info_final_v12_conversion.csv')

# Split the dataset into features and labels
X = dataset[dataset[['Happy', 'Stress', 'Eyes']]]
y = dataset[dataset['phenotype']]

# Split the dataset into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Build the classifier and make prediction
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)

# Print the confusion matrix
print(confusion_matrix(y_test, prediction))

# Save the model to disk
joblib.dump(classifier, 'classifier.joblib')

Here is a snapshot of my data:

name rating phenotype Happy Stress Eyes
tommy 7.1 boy 56 23 19
jill 2.3 girl 74 57
carlos 4.4 neither 45
0

1 Answer 1

5

This error most commonly occurs when you try to select columns using a dataframe. For example, in the OP, an entire dataframe (instead of a list/array of column names) is used to select columns, which is throwing this error.

X = dataset[dataset[['Happy', 'Stress', 'Eyes']]]  # <----- error
X = dataset[['Happy', 'Stress', 'Eyes']]           # <----- no error

Note that many pd.DataFrame methods such as select_dtypes(), filter(), query(), take() etc. return a dataframe, so use the result as is; don't use it to filter the dataframe again.

df.select_dtypes('int')       # <--- already dataframe
df[df.select_dtypes('int')]   # <--- ValueError: Boolean array expected

df.filter(['col1'])           # <--- already dataframe
df[df.filter(['col1'])]       # <--- ValueError: Boolean array expected

Another case is when you call where() without a boolean condition. For example,

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
df.where(df['col2'])                               # <----- error
df.where(df['col2'] == 'a')                        # <----- no error

In fact, if we look at the source code, when you pass a dataframe to __getitem__ (i.e. []), as in the first case, where() is called, so both cases boil down to the same part of the code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.