2

Is this possible to accomplish with Numpy and with good performance?

Initial 2D array:

array([[0, 1, 1, 1, 1, 0],
       [0, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 1]])

If the sum of each row is less than 4, set the last item in each row to 1:

array([[0, 1, 1, 1, 1, 0],
   [0, 0, 1, 0, 0, 1],
   [1, 0, 0, 0, 0, 1]])

Divide each item in each row with the sum of each row and get this result:

array([[0, 0.25, 0.25, 0.25, 0.25, 0],
   [0, 0, 0.5, 0, 0, 0.5],
   [0.5, 0, 0, 0, 0, 0.5]])

3 Answers 3

1

You can do the conditional assignment in a single line with some clever boolean indexing:

arr = np.array([[0, 1, 1, 1, 1, 0],
                    [0, 0, 1, 0, 0, 0],
                    [1, 0, 0, 0, 0, 1]])

arr[arr.sum(axis=1) < 4, -1] = 1
print(arr)

Output:

[[0 1 1 1 1 0]
 [0 0 1 0 0 1]
 [1 0 0 0 0 1]]

You can then divide each row by its sum like this:

arr = arr / arr.sum(axis=1, keepdims=True)
print(arr)

Output:

[[0.   0.25 0.25 0.25 0.25 0.  ]
 [0.   0.   0.5  0.   0.   0.5 ]
 [0.5  0.   0.   0.   0.   0.5 ]]

Explanation

Let's give the boolean index array arr.sum(axis=1) >= 4 the name boolix. boolix looks like:

[ True False False]

If you slice arr with boolix, it will return an array with all of the rows of arr for which the corresponding value in boolix is True. So the result of arr[boolix] is an array with the 1st and 2nd rows of arr:

[[0 0 1 0 0 0]
 [1 0 0 0 0 1]]

In the code above, arr was sliced as arr[boolix, -1]. Adding a second index to the slice arr[anything, -1] makes the slice contain only the last value in each row (ie the value in the last column). So the arr[boolix, -1] will return:

[0 1]

Since these slices can also be assigned to, assigning 1 to the slice arr[boolix, -1] solves your problem.

Sign up to request clarification or add additional context in comments.

Comments

1

numpy.where can also be useful here to find the rows matching your condition:

import numpy as np
a = np.array([[0, 1, 1, 1, 1, 0],
              [0, 0, 1, 0, 0, 0],
              [1, 0, 0, 0, 0, 1]])

a[np.sum(a,axis=1) < 4, -1] = 1
a = a/a.sum(axis=1)[:,None]

print(a)

# Output 
# [[0.   0.25 0.25 0.25 0.25 0.  ]
#  [0.   0.   0.5  0.   0.   0.5 ]
#  [0.5  0.   0.   0.   0.   0.5 ]]

PS: Edited after @tel suggestion :)

2 Comments

Combining the row and column slices with [x, -1] is a nice idea. However, the np.where is completely pointless. You can remove it (and the extra work it's doing) and you get the same effect.
Oh right! For some reason I omited this. Thanks for pointing it out. I will edit my answer.
0

I think you need:

x = np.array([[0, 1, 1, 1, 1, 0],
   [0, 0, 1, 0, 0, 0],
   [1, 0, 0, 0, 0, 1]])

x[:,-1][x.sum(axis=1) < 4] = 1
# array([[0, 1, 1, 1, 1, 0],
#   [0, 0, 1, 0, 0, 1],
#  [1, 0, 0, 0, 0, 1]])

print(x/x.sum(axis=1)[:,None])

Output:

array([[0.  , 0.25, 0.25, 0.25, 0.25, 0.  ],
       [0.  , 0.  , 0.5 , 0.  , 0.  , 0.5 ],
       [0.5 , 0.  , 0.  , 0.  , 0.  , 0.5 ]])

2 Comments

Indexing twice (e.g. x[a][b] instead of x[a, b]) is usually a bad idea, as it may have unintended consequences (e.g. sometimes you can assign values this way, sometimes you can't)
@NilsWerner That is a good point that hadn't occurred to me when I was writing my answer (which did originally use x[a][b]). The issue is that while x[a][b] will return a view in many cases, sometimes it does return a copy instead, right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.