0

I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.

Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes What am I doing wrong?

g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
    0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528; 
    0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY

SVMmodel_testm =  fitcsvm(m3,g,'KernelFunction','Linear');

d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
    min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)];        % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores

figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
    % Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
    % Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY'); 
set(gca,'Color',[0.5 0.5 0.5]);
hold off

1 Answer 1

0

A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:

g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
    0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528; 
    0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm =  fitcsvm(m3,g,'KernelFunction','Linear');

Notice the line before the last.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, it is working as expected now. What happens if varY may vary for 0 to 1? Those dataset are an extract from a large dataset with several classes represented but I am only interested on separating those two classes represented here by their samples.
Well, there are several ways to normalize the data. I find that usually normalizing by std is the best. sometimes a range normalization also works fine. There is no real rule to know when to use what, unless there is some other information about the data, for example one feature is in meters and the other is in cm.
this is about pixel's intensities in an image and sometimes target features are acquired lighter others darker... According to your explanation, shouldn't I apply the same normalization procedure to varX?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.