1

I am trying to make a python function run faster by taking advantage of the vectorization features in Numpy.

The aim of the function is that at the end of each iteration of the for loop, the location arr[i, 0] (i.e first entry of each row) will contain the number of rows in the 2D array (data), whose value if at position 0 is less than or equal to that or array i

# Data is a 2D array
def function(data)
    n = len(data)
    arr = np.zeros(shape=(n, 9))
    for i, sample in enumerate(data):
        arr[i][0] = np.count_nonzero(data[:, 0] <= data[i][0])

I am currently using a for loop. The runtime is horribly slow as this is being used in a machine learning algorithm. I have been told that I can further improve this by vectorizing the for loop.

I have tried this, in my attempt to use list comprehension:

arr[:, 0] = ( np.count_nonzero(data[:, x] >  data[i][x]) for i in gini_arr[:, 0] )

But I get a generator error:

TypeError: float() argument must be a string or a number, not 'generator'

How can I vectorize this for loop?

3
  • 1
    ( np.count_nonzero(data[:, x] > data[i][x]) for i in gini_arr[:, 0] ) this is not list comrehension it returns generator. add sample input and expected output Commented Sep 6, 2020 at 7:45
  • Your code is incomplete. Look at arr = np.zeros(shape=(n*9). It contains unbalanced parentheses. Is it enough to add ")" at the end? And what are values of n and data? Commented Sep 6, 2020 at 9:37
  • I cannot paste the complete code here, this is just an extract which is supposed to convey what I am trying to achieve. Commented Sep 6, 2020 at 14:40

1 Answer 1

2

In your comment to my initial solution you wrote: in column 0, no element is less than 0. So this brings me to a conclusion that in your function there should be < instead of <=.

Another weird point is that you created a loop with one of control variables named sample but you never use this variable. A more natural solution is to iterate over a range(n) (you computed n earlier, as the number of rows in data).

The last detail to correct in your code is that your function does not return anything.

And one improvement: As you compute the count of elements meeting some condition, the result is by definition an array of integers, not float.

So the corrected content of your function (operating the old way) should be:

def function(data):
    n = len(data)
    arr = np.zeros((n, 9), dtype=int)
    for i in range(n):
        arr[i, 0] = np.count_nonzero(data[:, 0] < data[i, 0])
    return arr

How to compute this result in a vectorized way

You can get just the same result, when you define your function as:

def fn(data):
    dd = data[:, 0]  # The first column
    res = np.less(dd[np.newaxis, :], dd[:, np.newaxis]).sum(axis=1)
    return np.hstack([res[:, np.newaxis], np.zeros((dd.size, 8), dtype=int)])

Note that res (a 1-D array) contains the result of comparison of each element in dd with all elements in dd.

The final result is generated by:

  • conversion of res to an array with a single column,
  • "addition" of 8 columns filled with zeroes.

To test this function I created a test array:

data = np.arange(25).reshape(5,-1)
data[3,0] = 1

so that it contains:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [ 1, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

Then I ran fn(data) getting:

array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
       [2, 0, 0, 0, 0, 0, 0, 0, 0],
       [3, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0],
       [4, 0, 0, 0, 0, 0, 0, 0, 0]])

For the above test data (5 rows) my function computed only 5 % faster (107 µs yours and 101 µs mine).

But for a bigger number of rows, e.g. 60, your function completed in 1.24 ms, whereas my function in 165 µs, so 7 times faster.

For yet biger number of rows, the advantage of my code should be yet more visible.

Edit following the comment as of 2020-09-07

Explanation of how np.newaxis works:

  1. dd (in fn function) contains array([ 0, 5, 10, 1, 20]) - the first column of data, but here it is a 1-D array.

  2. dd[np.newaxis, :] is a 2-D array, containing just the same numbers, in a single row: array([[ 0, 5, 10, 1, 20]]).

  3. dd[:, np.newaxis] is also a 2-D array, but this time the same numbers are the content of a single column:

    array([[ 0],
           [ 5],
           [10],
           [ 1],
           [20]])
    
  4. Now run np.less(dd[np.newaxis, :], dd[:, np.newaxis]).astype(int) (to facilitate the assessment of what has been computed, I added conversion to int):

    array([[0, 0, 0, 0, 0],
           [1, 0, 0, 1, 0],
           [1, 1, 0, 1, 0],
           [1, 0, 0, 0, 0],
           [1, 1, 1, 1, 0]])
    

    The above result is a 2-D array, the result of comparison (left_argument < right_argument) where:

    • row number is the number of element from "row argument",
    • column number is the number of element "column argument".

    So the whole array is the result of "each to each" comparison.

    Example: The first element in row (0) is not less than any element of the column, so the first row of the result contains all zeroes.

  5. And the last step is sum(axis=1), i.e. compute sum of the above array along each row.

This way the final result is: array([0, 2, 3, 1, 4]) and it will be the first column of the whole result of fn function.

If you still need more explanation, search the Web for np.newaxis. Even on SO you can find many explanations on this matter.

Sign up to request clarification or add additional context in comments.

3 Comments

Hello, Thank you for taking the time to answer. arr is a 2D array. I have fixed the sample code to reflect that. Given the array above, the solution should be [0, 2, 3, 1, 4]. Because in column 0, no element is less than 0, 2 elements are less than 5, 3 elements are less than 10 .... The problem is not finding a solution. I need to how to vectorize it and get rid of the for loop.
I totally replaced my initial answer with a new one, since the "old" anwer did not reflect what is actually to do in your corrected post.
Thank you very much for your time and effort, we need more people like you on stack overflow. Can you please explain this line for me: (res = np.less(dd[np.newaxis, :], dd[:, np.newaxis]).sum(axis=1)). I understand np.less, but I dont understand what is happening with the newaxis and sum

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.