I'm trying to optimize the following code, potentially by rewriting it in Cython: it simply takes a low dimensional but relatively long numpy arrays, looks into of its columns for 0 values, and marks those as -1 in an array. The code is:
import numpy as np
def get_data():
data = np.array([[1,5,1]] * 5000 + [[1,0,5]] * 5000 + [[0,0,0]] * 5000)
return data
def get_cols(K):
cols = np.array([2] * K)
return cols
def test_nonzero(data):
K = len(data)
result = np.array([1] * K)
# Index into columns of data
cols = get_cols(K)
# Mark zero points with -1
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
import time
t_start = time.time()
data = get_data()
for n in range(5000):
test_nonzero(data)
t_end = time.time()
print (t_end - t_start)
data is the data. cols is the array of columns of data to look for non-zero values (for simplicity, I made it all the same column). The goal is to compute a numpy array, result, which has a 1 value for each row where the column of interest is non-zero, and -1 for the rows where the corresponding columns of interest have a zero.
Running this function 5000 times on a not-so-large array of 15,000 rows by 3 columns takes about 20 seconds. Is there a way this can be sped up? It appears that most of the work goes into finding the nonzero elements and retrieving them with indices (the call to nonzero and subsequent use of its index.) Can this be optimized or is this the best that can be done?
How could a Cython implementation gain speed on this?