2

I have a numpy array that has dimensions (x, y, z) = (5, 50, 4). For every (x, y) pair, I want to find the index of the maximum value along the z axis. This index is in range(4). I want to select all of these "maximum" elements and set them to 1. Then, I want to select all other elements and set them to zero.

To explain it another way, I want to look at all vectors in the "z" direction (there are x*y of these vectors total). I want to set the maximum element to 1 and all other elements to 0. For example, the vector (0.25, 0.1, 0.5, 0.15) will become (0, 0, 1, 0).

I've tried many different things. The argmax function seems like it should help. But how do I use it to select elements correctly? I have tried things like...

x = data
i = x.argmax(axis = 2)
x[i] # shape = (5, 50, 50, 4)
x[:,:,i] # shape = (5, 50, 5, 50)
x[np.unravel_index(i), x.shape] # shape = (5, 50)

The last one, which uses np.unravel_index, has the correct shape, but the selected indices are NOT the maximum values along the z axis. So I'm having some trouble. If anyone could help at all, it would be really awesome. Thanks!


Edit: Here is a way I have found to do this. But if anyone has anything that is faster, please let me know!

def fix_vector(a):
    i = a.argmax()
    a = a*0
    a[i] = 1
    return a

y = np.apply_along_axis(fix_vector, axis=2, arr=x)

I would really like to optimize this if possible, since I call this function MANY times.


Edit: Thanks DSM for a nice solution. Here is a small example dataset, as requested in the comments.

data = np.random.random((3,5,4))
desired_output = np.apply_along_axis(fix_vector, axis=2, arr=data)

This uses the fix_vector function I posted above, but DSM's solution is faster. Thanks again!

1
  • 1
    You should add a, small, example of the input array and the desired result after processing. Commented Aug 13, 2015 at 3:40

3 Answers 3

2

This isn't particularly elegant, but:

def faster(x):
    d = x.reshape(-1, x.shape[-1])
    d2 = np.zeros_like(d)
    d2[np.arange(len(d2)), d.argmax(1)] = 1
    d2 = d2.reshape(x.shape)
    return d2

seems a little faster than the fix_vector approach. For example:

>>> x,y,z = 5,50,4
>>> data = np.random.random((x,y,z))
>>> np.allclose(orig(data), faster(data))
True
>>> %timeit -n 1000 orig(data)
1000 loops, best of 3: 5.77 ms per loop
>>> %timeit -n 1000 faster(data)
1000 loops, best of 3: 36.6 µs per loop
Sign up to request clarification or add additional context in comments.

Comments

1

This can be done with numpy.where

import numpy as np
a = np.array([[[ 0.25,  0.10 ,  0.50 ,  0.15],
               [ 0.50,  0.60 ,  0.40 ,  0.30]],
              [[ 0.25,  0.50 ,  0.20 ,  0.70],
               [ 0.80,  0.10 ,  0.50 ,  0.15]]])

Find the max value along the last axis

b = a.max(-1)    #shape is (2,2)

Add an axis to b so that it will broadcast across a and create a boolean array.

condition = a == b[..., np.newaxis]

Use numpy.where for the substitution.

c = np.where(condition, 1, 0)

>>> c
array([[[0, 0, 1, 0],
        [0, 1, 0, 0]],

       [[0, 0, 0, 1],
        [1, 0, 0, 0]]])

def f(a):
    b = a.max(-1)
    condition = a == b[..., np.newaxis]
    return np.where(condition, 1, 0)

So I wasn't satisfied - I've seen cases where reshape is faster than adding a new axis so I played around a bit. Seems numpy.where itself is a tad slow. Using boolean indexing with an assignment gives pretty much the same performance as @DSM's.

def h(a):
    b = a.max(-1)
    condition = a == b[..., np.newaxis]
    a[condition] = 1
    a[np.logical_not(condition)] = 0
    return a

4 Comments

Comparable but a tad (50%) slower than @DSM's - looks like they both scale the same with the size of the array.
Note that conceivably this could result in multiple 1s in a row if the maximum isn't unique. But for all we know, this might be what the OP wants to happen in that case (the question is ambiguous on the point).
@DSM - yours chooses the lowest index for multiple maximums in a row?
thanks for the solution, wwii. this is exactly what i was looking for. just fyi, for my application, i never encounter two identical values, so both solutions should work.
0

Sample x, (2,3,4) in shape:

In [934]: x
Out[934]: 
array([[[ 5,  7,  4, 15],
        [ 9, 23, 14,  9],
        [17, 13,  2,  1]],

       [[ 2, 18,  3, 18],
        [10, 12, 23, 14],
        [17, 23, 19, 13]]])

The max values along the last axis - clear enough, a (2,3) array:

In [935]: x.max(2)
Out[935]: 
array([[15, 23, 17],
       [18, 23, 23]])

Cooresponding indexes along that axis

In [936]: I=x.argmax(2)

In [937]: I
Out[937]: 
array([[3, 1, 0],
       [1, 2, 1]], dtype=int32)

One way of indexing x with those indexes. At this point I'm constructing the 0 and 1 axis indexes manually. It needs to be automated

In [938]: x[np.arange(2)[:,None], np.arange(3)[None,:], I]
Out[938]: 
array([[15, 23, 17],
       [18, 23, 23]])

The same indexes can be used to set values in another array:

In [939]: y=np.zeros_like(x)   
In [940]: y[np.arange(2)[:,None],np.arange(3)[None,:],I]=x.max(2)
In [941]: y
Out[941]: 
array([[[ 0,  0,  0, 15],
        [ 0, 23,  0,  0],
        [17,  0,  0,  0]],

       [[ 0, 18,  0,  0],
        [ 0,  0, 23,  0],
        [ 0, 23,  0,  0]]])

The trick is to come up with a more streamlined way of generating the indexing tuple:

(np.arange(2)[:,None], np.arange(3)[None,:], I)

(array([[0],
    [1]]), 
 array([[0, 1, 2]]), 
 array([[3, 1, 0],
    [1, 2, 1]], dtype=int32))

np.ix_ helps, though concatenating 2 tuples is a little ugly:

J = np.ix_(range(x.shape[0]), range(x.shape[1])) + (I,)

Indexing on a flattened array can be faster:

In [998]: J1 = np.ravel_multi_index(J, x.shape)
In [999]: J1
Out[999]: 
array([[ 3,  5,  8],
     [13, 18, 21]], dtype=int32)

In [1000]: x.flat[J1]
Out[1000]: 
array([[15, 23, 17],
   [18, 23, 23]])

In [1001]: y.flat[J1]=2

In [1002]: y
Out[1002]: 
array([[[0, 0, 0, 2],
        [0, 2, 0, 0],
        [2, 0, 0, 0]],

       [[0, 2, 0, 0],
        [0, 0, 2, 0],
        [0, 2, 0, 0]]])

But I still need to construct the J indexing tuple.

This is similar to DSMs solution, but sticks with the 3 dimensions. Flattening 2 of the dimensions eliminates the need for the xi_ construction.


def max_index(x, n, fn=np.argmax):
    # return multidimensional indexing tuple
    # applying fn along axis n
    I = fn(x, axis=n)
    alist = list(np.ix_(*[range(i) for i in I.shape]))
    alist.insert(n, I)
    return tuple(alist)

is a general purpose function to generate the J like indexing tuple

e.g.

y = np.zeros_like(x)
I = max_index(x, 1, np.argmin)
y[I] = x[I]
I = max_index(x, 1, np.argmax)
y[I] = x[I]

inserts both the minimum and maximum values of x (along axis 1) into y. Speed is about half of DSMs faster.

1 Comment

thanks for the solution, hpaulj. i think DSM and wwii provided more elegant solutions, but it was interesting to see how this works too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.