My Setup: Python 2.7.4.1, Numpy MKL 1.7.1, Windows 7 x64, WinPython
Context:
I tried to implement the Sequential Minimal Optimization algorithm for solving SVM. I use maximal violating pair approach.
The problem:
In working set selection procedure i want to find maximum value of gradient and its index for elements which met some condition, y[i]*alpha[i]<0 or y[i]*alpha[i]
#y - array of -1 and 1
y=np.array([-1,1,1,1,-1,1])
#alpha- array of floats in range [0,C]
alpha=np.array([0.4,0.1,1.33,0,0.9,0])
#grad - array of floats
grad=np.array([-1,-1,-0.2,-0.4,0.4,0.2])
GMaxI=float('-inf')
GMax_idx=-1
n=alpha.shape[0] #usually n=100000
C=4
B=[0,0,C]
for i in xrange(0,n):
yi=y[i] #-1 or 1
alpha_i=alpha[i]
if (yi * alpha_i< B[yi+1]): # B[-1+1]=0 B[1+1]=C
if( -yi*grad[i]>=GMaxI):
GMaxI= -yi*grad[i]
GMax_idx = i
This procedure is called many times (~50000) and profiler shows that this is the bottleneck. It is possible to vectorize this code?
Edit 1: Add some small exemplary data
Edit 2: I have checked solution proposed by hwlau , larsmans and Mr E. Only solutions proposed Mr E is correct. Below sample code with all three answers:
import numpy as np
y=np.array([ -1, -1, -1, -1, -1, -1, -1, -1])
alpha=np.array([0, 0.9, 0.4, 0.1, 1.33, 0, 0.9, 0])
grad=np.array([-3, -0.5, -1, -1, -0.2, -4, -0.4, -0.3])
C=4
B=np.array([0,0,C])
#hwlau - wrong index and value
filter = (y*alpha < C*0.5*(y+1)).astype('float')
GMax_idx = (filter*(-y*grad)).argmax()
GMax = -y[GMax_idx]*grad[GMax_idx]
print GMax_idx,GMax
#larsmans - wrong index
neg_y_grad = (-y * grad)[y * alpha < B[y + 1]]
GMaxI = np.max(neg_y_grad)
GMax_ind = np.argmax(neg_y_grad)
print GMax_ind,GMaxI
#Mr E - correct result
BY = np.take(B, y+1)
valid_mask = (y * alpha < BY)
values = -y * grad
values[~valid_mask] = np.min(values) - 1.0
GMaxI = values.max()
GMax_idx = values.argmax()
print GMax_idx,GMaxI
Output (GMax_idx, GMaxI)
0 -3.0
3 -0.2
4 -0.2
Conclusions
After checking all solutions, the fastest one (2x-6x) is solution proposed by @ali_m. However it requires to install some python packages: numba and all its prerequisites.
I have some trouble to use numba with class methods, so I create global functions which are autojited with numba, my solution look something like this:
from numba import autojit
@autojit
def FindMaxMinGrad(A,B,alpha,grad,y):
'''
Finds i,j indices with maximal violatin pair scheme
A,B - 3 dim arrays, contains bounds A=[-C,0,0], B=[0,0,C]
alpha - array like, contains alpha coeficients
grad - array like, gradient
y - array like, labels
'''
GMaxI=-100000
GMaxJ=-100000
GMax_idx=-1
GMin_idx=-1
for i in range(0,alpha.shape[0]):
if (y[i] * alpha[i]< B[y[i]+1]):
if( -y[i]*grad[i]>GMaxI):
GMaxI= -y[i]*grad[i]
GMax_idx = i
if (y[i] * alpha[i]> A[y[i]+1]):
if( y[i]*grad[i]>GMaxJ):
GMaxJ= y[i]*grad[i]
GMin_idx = i
return (GMaxI,GMaxJ,GMax_idx,GMin_idx)
class SVM(object):
def working_set(self,....):
FindMaxMinGrad(.....)
nor generate random data of the right shape with NumPy's random module. Make it so that we can directly run this code without guessing the input and output values. Then you'll get a useful answer in no time.argmax. Fixed that; still only three lines of code.