1

I have a supervised learning classification problem. I have 4 numeric class labels (0, 1, 2, 3) and I have about 100 trials of 38 separate features as the input.

After inputting this data into an SVC classifier in Python and Matlab (specifically the Classification Learner App), and matching the hyperparameters (C = 1, type = quadratic SVM, multi-class_method = onevsone, standardised data, no PCA), the accuracies given vary drastically:

  • Matlab = 86.7 %
  • Python = 45.0 %

Has anyone come across this or have any other ideas what I could do to know which one is correct?

Matlab input:

Matlab SVM settings from Classification Learner App

Python input:

import numpy as np
from sklearn import datasets, linear_model, metrics, svm, preprocessing
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC, LinearSVC
symptom = input("What symptom would you like to analyse? \n")
cross_validation = input("With cross validation? \n")
if cross_validation == "Yes":
    no_cvfolds = np.int(input("Number of folds? \n"))  
x = symptomDF[feature]
y = symptomDF.loc[:, 'updrs_class'].values  
x_new = StandardScaler().fit_transform(x)  
scores = cross_val_score(SVC(kernel='poly', degree = 2, C=1.0, decision_function_shape = 'ovo'), x_new, y, cv = no_cvfolds)
print(name + " Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
5
  • 1
    There are many reasons why two separate implementations of apparently the same model might diverge. Probably the first place to start is with the accuracy score calculation. Are you certain that the outputs of the classifiers really are very different? You may need to post more details than this for anyone to give specific advice. Commented Apr 19, 2018 at 20:24
  • It could be useful to check up the documentation of the libraries which you use for SVC in both Matlab and Python. It could be that the parameters are processed internally in the libraries and hence, don't mean the same. For example, C=1 may not mean that value internally. Commented Apr 19, 2018 at 20:26
  • I have added a bit more detail to my post, what other details would help with the answer? @mahesh I have checked that the hyper parameters I am matching are the same. For example Box constraint in Matlab is the equivalent of the C parameter in Python. Commented Apr 19, 2018 at 20:53
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. Minimal, complete, verifiable example applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. Commented Apr 20, 2018 at 1:44
  • Have added more detail as suggested! Commented Apr 20, 2018 at 8:49

1 Answer 1

0

So after a few days, one factor that seemed to have helped provide similar accuracies to Matlab, was the cross validator used and the shuffling parameter in the model.

Instead of directly stating the number of folds and inputting that into the cross_val_score function, I defined a Stratified K-folds cross validator. This was better for me because it provided a balance during cross validation when I had an imbalance in class data sizes. I then explicitly defined shuffle = True to shuffle the data before the cross validator split them into batches.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.