2

I have a Python Numpy array that is a 2D array where the second dimension is a subarray of 3 elements of integers. For example:

[ [2, 3, 4], [9, 8, 7], ... [15, 14, 16] ]

For each subarray I want to replace the lowest number with a 1 and all other numbers with a 0. So the desired output from the above example would be:

[ [1, 0, 0], [0, 0, 1], ... [0, 1, 0] ]

This is a large array, so I want to exploit Numpy performance. I know about using conditions to operate on array elements, but how do I do this when the condition is dynamic? In this instance the condition needs to be something like:

newarray = (a == min(a)).astype(int)

But how do I do this across each subarray?

2 Answers 2

1

You can specify the axis parameter to calculate a 2d array of mins(if you keep the dimension of the result), then when you do a == a.minbyrow, you will get trues at the minimum position for each sub array:

(a == a.min(1, keepdims=True)).astype(int)
#array([[1, 0, 0],
#       [0, 0, 1],
#       [0, 1, 0]])
Sign up to request clarification or add additional context in comments.

Comments

1

How about this?

import numpy as np

a = np.random.random((4,3))

i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1

print(a)
print(out)

Sample output:

# [[ 0.58321885  0.18757452  0.92700724]
#  [ 0.58082897  0.12929637  0.96686648]
#  [ 0.26037634  0.55997658  0.29486454]
#  [ 0.60398426  0.72253012  0.22812904]]
# [[0 1 0]
#  [0 1 0]
#  [1 0 0]
#  [0 0 1]]

It appears to be marginally faster than the direct approach:

from timeit import timeit

def dense():
    return (a == a.min(1, keepdims=True)).astype(int)

def sparse():
    i = np.argmin(a, axis=-1)
    out = np.zeros(a.shape, int)
    out[np.arange(out.shape[0]), i] = 1
    return out

for shp in ((4,3), (10000,3), (100,10), (100000,1000)):
    a = np.random.random(shp)
    d = timeit(dense, number=40)/40
    s = timeit(sparse, number=40)/40
    print('shape, dense, sparse, ratio', '({:6d},{:6d}) {:9.6g} {:9.6g} {:9.6g}'.format(*shp, d, s, d/s))

Sample run:

# shape, dense, sparse, ratio (     4,     3) 4.22172e-06 3.1274e-06   1.34992
# shape, dense, sparse, ratio ( 10000,     3) 0.000332396 0.000245348   1.35479
# shape, dense, sparse, ratio (   100,    10) 9.8944e-06 5.63165e-06   1.75693
# shape, dense, sparse, ratio (100000,  1000)  0.344177  0.189913   1.81229

1 Comment

Many thanks - this is a good answer. Numpy sure is an interesting beast. Your solution works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.