Python ValueError: Unknown label type: 'continuous'

Question

I'm a beginner here and I am trying for the life of me to understand this other stack over flow post that has the same question as I do.. Logistic Regression:Unknown label type: 'continuous'

This is my machine learning code below, and the shell output is giving me ValueError: Unknown label type: 'continuous'

I think I understand that I am "passing floats to a classifier which expects categorical values as the target vector. If you convert it to int it will be accepted as input (although it will be questionable if that's the right way to do it). It would be better to convert your training scores by using scikit's labelEncoder function"

Can someone give me a tip on how to incorporate scikit's labelEncoder function into my code? Is this implemented prior to stating the classifiers X & y? Whatever I am trying I am doing something wrong. Thank you

import numpy as np
from sklearn import preprocessing, cross_validation, neighbors, utils
import pandas as pd

df = pd.read_csv('C:\\Users\\bbartling\\Documents\\Python\\WB             
Data\\WB_RTU6data.csv', index_col='Date', parse_dates=True)

print(df.head())
print(df.tail())
print(df.shape)
print(df.columns)
print(df.info())
print(df.describe())


X = np.array(df.drop(['VAV6znt'],1))
df.dropna(inplace=True)

y = np.array(df['VAV6znt'])


accuracies = []

X_train, X_test, y_train, y_test =             
cross_validation.train_test_split(X,y,test_size=0.50)

clf = neighbors.KNeighborsClassifier(n_neighbors=50)
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)

print(accuracy)

Vivek Kumar · Accepted Answer · 2017-09-06 15:14:10Z

2

Since your VAV6znt column is a float, which means you are trying to estimate a numerical value from the data. That makes it a regression problem and you are using KNeighborsClassifier which is a classification estimator.

Try using KNeighborsRegressor or any other estimators which have Regressor in their name.

Converting them to int as you did above will work but will not give good results because that means that you have those many classes in your data as their are unique ints in it, which obviously is wrong.

answered Sep 6, 2017 at 15:14

Vivek Kumar

36.8k9 gold badges116 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

bbartling Over a year ago

Thanks for the response, can you tell me where in my code I am converting to integers? I dont quite understand that and I want to try to avoid that at all possible... I have a feeling all of my work will be this type of 'float' type data.. Any help is greatly appreciated I am also trying to find out all of Sci kit learn "Regressor" type estimators

Vivek Kumar Over a year ago

@HenryHub You said in your question about "If you convert it to int it will be accepted as input ". Anyways you should first understand about classification and regression tasks. Anyways, search for Regressor on this page: scikit-learn.org/stable/modules/classes.html#api-reference

bbartling Over a year ago

This worked very good, thanks for the tip. Ill have to do some research between classification and regression tasks..

Collectives™ on Stack Overflow

Python ValueError: Unknown label type: 'continuous'

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related