I am trying to solve the decision tree problem in python using scikit_learn and pandas. The data set is available in CSV file.
When I try to load data in python, I get an error that says "ValueError: could not convert string to float: 'CustomerID'". I don't know what I have done wrong in code.
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
col_names=['CustomerID','Gender','Car Type', 'Shirt Size','Class']
pima=pd.read_csv("F:\Current semster courses\Machine
Learning\ML_A1_Fall2019\Q2_dataset.csv",header=None, names=col_names)
pima.head()
feature_cols=['CustomerID','Gender','Car Type', 'Shirt Size']
X=pima[feature_cols]
y=pima.Class
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
clf = DecisionTreeClassifier()
# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)
#Predict the response for test dataset
y_pred = clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
Can someone tell me what I am doing wrong?
Dataset:
CustomerID Gender Car Type Shirt Size Class
1 M Family Small C0
2 M Sports Medium C0
3 M Sports Medium C0
4 M Sports Large C0
5 M Sports Extra Large C0
6 M Sports Extra Large C0
7 F Sports Small C0
8 F Sports Small C0
9 F Sports Medium C0
10 F Luxury Large C0
11 M Family Large C1
12 M Family Extra Large C1
13 M Family Medium C1
14 M Luxury Extra Large C1
15 F Luxury Small C1
16 F Luxury Small C1
17 F Luxury Medium C1
18 F Luxury Medium C1
19 F Luxury Medium C1
20 F Luxury Large C1