Differences between Matlab classification and Python classification

Question

I have a supervised learning classification problem. I have 4 numeric class labels (0, 1, 2, 3) and I have about 100 trials of 38 separate features as the input.

After inputting this data into an SVC classifier in Python and Matlab (specifically the Classification Learner App), and matching the hyperparameters (C = 1, type = quadratic SVM, multi-class_method = onevsone, standardised data, no PCA), the accuracies given vary drastically:

Matlab = 86.7 %
Python = 45.0 %

Has anyone come across this or have any other ideas what I could do to know which one is correct?

Matlab input:

Matlab SVM settings from Classification Learner App

Python input:

import numpy as np
from sklearn import datasets, linear_model, metrics, svm, preprocessing
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC, LinearSVC
symptom = input("What symptom would you like to analyse? \n")
cross_validation = input("With cross validation? \n")
if cross_validation == "Yes":
    no_cvfolds = np.int(input("Number of folds? \n"))  
x = symptomDF[feature]
y = symptomDF.loc[:, 'updrs_class'].values  
x_new = StandardScaler().fit_transform(x)  
scores = cross_val_score(SVC(kernel='poly', degree = 2, C=1.0, decision_function_shape = 'ovo'), x_new, y, cv = no_cvfolds)
print(name + " Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

There are many reasons why two separate implementations of apparently the same model might diverge. Probably the first place to start is with the accuracy score calculation. Are you certain that the outputs of the classifiers really are very different? You may need to post more details than this for anyone to give specific advice. — Bill
– Bill, Commented Apr 19, 2018 at 20:24
It could be useful to check up the documentation of the libraries which you use for SVC in both Matlab and Python. It could be that the parameters are processed internally in the libraries and hence, don't mean the same. For example, C=1 may not mean that value internally. — mahesh
– mahesh, Commented Apr 19, 2018 at 20:26
I have added a bit more detail to my post, what other details would help with the answer? @mahesh I have checked that the hyper parameters I am matching are the same. For example Box constraint in Matlab is the equivalent of the C parameter in Python. — Platon
– Platon, Commented Apr 19, 2018 at 20:53
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. Minimal, complete, verifiable example applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. — Prune
– Prune, Commented Apr 20, 2018 at 1:44

Platon · Accepted Answer · 2018-05-01 10:59:38Z

0

So after a few days, one factor that seemed to have helped provide similar accuracies to Matlab, was the cross validator used and the shuffling parameter in the model.

Instead of directly stating the number of folds and inputting that into the cross_val_score function, I defined a Stratified K-folds cross validator. This was better for me because it provided a balance during cross validation when I had an imbalance in class data sizes. I then explicitly defined shuffle = True to shuffle the data before the cross validator split them into batches.

answered May 1, 2018 at 10:59

Platon

193 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Differences between Matlab classification and Python classification

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related