0

I'm a beginner here and I am trying for the life of me to understand this other stack over flow post that has the same question as I do.. Logistic Regression:Unknown label type: 'continuous'

This is my machine learning code below, and the shell output is giving me ValueError: Unknown label type: 'continuous'

I think I understand that I am "passing floats to a classifier which expects categorical values as the target vector. If you convert it to int it will be accepted as input (although it will be questionable if that's the right way to do it). It would be better to convert your training scores by using scikit's labelEncoder function"

Can someone give me a tip on how to incorporate scikit's labelEncoder function into my code? Is this implemented prior to stating the classifiers X & y? Whatever I am trying I am doing something wrong. Thank you

import numpy as np
from sklearn import preprocessing, cross_validation, neighbors, utils
import pandas as pd

df = pd.read_csv('C:\\Users\\bbartling\\Documents\\Python\\WB             
Data\\WB_RTU6data.csv', index_col='Date', parse_dates=True)

print(df.head())
print(df.tail())
print(df.shape)
print(df.columns)
print(df.info())
print(df.describe())


X = np.array(df.drop(['VAV6znt'],1))
df.dropna(inplace=True)

y = np.array(df['VAV6znt'])


accuracies = []

X_train, X_test, y_train, y_test =             
cross_validation.train_test_split(X,y,test_size=0.50)

clf = neighbors.KNeighborsClassifier(n_neighbors=50)
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)

print(accuracy)

enter image description here enter image description here

1 Answer 1

2

Since your VAV6znt column is a float, which means you are trying to estimate a numerical value from the data. That makes it a regression problem and you are using KNeighborsClassifier which is a classification estimator.

Try using KNeighborsRegressor or any other estimators which have Regressor in their name.

Converting them to int as you did above will work but will not give good results because that means that you have those many classes in your data as their are unique ints in it, which obviously is wrong.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the response, can you tell me where in my code I am converting to integers? I dont quite understand that and I want to try to avoid that at all possible... I have a feeling all of my work will be this type of 'float' type data.. Any help is greatly appreciated I am also trying to find out all of Sci kit learn "Regressor" type estimators
@HenryHub You said in your question about "If you convert it to int it will be accepted as input ". Anyways you should first understand about classification and regression tasks. Anyways, search for Regressor on this page: scikit-learn.org/stable/modules/classes.html#api-reference
This worked very good, thanks for the tip. Ill have to do some research between classification and regression tasks..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.