Matlab SVM linear binary classification failure

Question

I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.

Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes What am I doing wrong?

g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
    0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528; 
    0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY

SVMmodel_testm =  fitcsvm(m3,g,'KernelFunction','Linear');

d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
    min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)];        % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores

figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
    % Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
    % Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY'); 
set(gca,'Color',[0.5 0.5 0.5]);
hold off

aarbelle · Accepted Answer · 2016-04-18 17:00:53Z

0

A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:

g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
    0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528; 
    0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm =  fitcsvm(m3,g,'KernelFunction','Linear');

Notice the line before the last.

answered Apr 18, 2016 at 17:00

aarbelle

1,0338 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nebur Over a year ago

Thanks, it is working as expected now. What happens if varY may vary for 0 to 1? Those dataset are an extract from a large dataset with several classes represented but I am only interested on separating those two classes represented here by their samples.

aarbelle Over a year ago

Well, there are several ways to normalize the data. I find that usually normalizing by std is the best. sometimes a range normalization also works fine. There is no real rule to know when to use what, unless there is some other information about the data, for example one feature is in meters and the other is in cm.

Nebur Over a year ago

this is about pixel's intensities in an image and sometimes target features are acquired lighter others darker... According to your explanation, shouldn't I apply the same normalization procedure to varX?

Collectives™ on Stack Overflow

Matlab SVM linear binary classification failure

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related