4

I have a 2-D numpy array that can be subdivided into 64 boxes (think of a chessboard). The goal is a function that returns the position and value of the maximum in each box. Something like:

FindRefs(array) --> [(argmaxX00, argmaxY00, Max00), ...,(argmaxX63, argmaxY63, Max63)]

where argmaxXnn and argmaxYnn are the indexes of the whole array (not of the box), and Maxnn is the max value in each box. In other words,

Maxnn = array[argmaxYnn,argmaxYnn]

I've tryed the obvious "nested-for" solution:

def FindRefs(array):
    Height, Width = array.shape
    plumx = []
    plumy = []
    lum = []
    w = int(Width/8)
    h = int(Height/8)
    for n in range(0,8):    # recorrer boxes
        x0 = n*w
        x1 = (n+1)*w
        for m in range(0,8):
            y0 = m*h
            y1 = (m+1)*h
            subflatind = a[y0:y1,x0:x1].argmax() # flatten index of box
            y, x = np.unravel_index(subflatind, (h, w))
            X = x0 + x
            Y = y0 + y
            lum.append(a[Y,X])
            plumx.append(X)
            plumy.append(Y)

    refs = []
    for pt in range(0,len(plumx)):
        ptx = plumx[pt]
        pty = plumy[pt]
        refs.append((ptx,pty,lum[pt]))
    return refs

It works, but is neither elegant nor eficient. So I've tryed this more pythonic version:

def FindRefs(a):
    box = [(n*w,m*h) for n in range(0,8) for m in range(0,8)]
    flatinds = [a[b[1]:h+b[1],b[0]:w+b[0]].argmax() for b in box]
    unravels = np.unravel_index(flatinds, (h, w))
    ur = [(unravels[1][n],unravels[0][n]) for n in range(0,len(box))]
    absinds = [map(sum,zip(box[n],ur[n])) for n in range(0,len(box))]
    refs = [(absinds[n][0],absinds[n][1],a[absinds[n][1],absinds[n][0]]) for n in range(0,len(box))] 
    return refs

It works fine but, to my surprise, is not more efficient than the previous version!

The question is: Is there a more clever way to do the task?

Note that efficiency matters, as I have many large arrays for processing.

Any clue is welcome. :)

1

1 Answer 1

2

Try this:

from numpy.lib.stride_tricks import as_strided as ast
import numpy as np
def FindRefs3(a):
    box = tuple(x/8 for x in a.shape)
    z=ast(a, \
          shape=(8,8)+box, \
          strides=(a.strides[0]*box[0],a.strides[1]*box[1])+a.strides)
    v3 = np.max(z,axis=-1)
    i3r = np.argmax(z,axis=-1)
    v2 = np.max(v3,axis=-1)
    i2 = np.argmax(v3,axis=-1)
    i2x = np.indices(i2.shape)
    i3 = i3r[np.ix_(*[np.arange(x) for x in i2.shape])+(i2,)]
    i3x = np.indices(i3.shape)
    ix0 = i2x[0]*box[0]+i2
    ix1 = i3x[1]*box[1]+i3
    return zip(np.ravel(ix0),np.ravel(ix1),np.ravel(v2))

Note that your first FindRefs reverses indices, so that for a tuple (i1,i2,v), a[i1,i2] won't return the right value, whereas a[i2,i1] will.

So here's what the code does:

  • It first calculates the dimensions that each box needs to have (box) given the size of your array. Note that this doesn't do any checking: you need to have an array that can be divided evenly into an 8 by 8 grid.
  • Then z with ast is the messiest bit. It takes the 2d array, and turns it into a 4d array. The 4d array has dimensions (8,8,box[0],box[1]), so it lets you choose which box you want (the first two axes) and then what position you want in the box (the next two). This lets us deal with all the boxes at once by doing operations on the last two axes.
  • v3 gives us the maximum values along the last axis: in other words, it contains the maximum of each column in each box. i3r contains the index of which row in the box contained that max value.
  • v2 takes the maximum of v3 along its own last axis, which is now dealing with rows in the box: it takes the column maxes, and finds the maximum of them, so that v2 is a 2d array containing the maximum value of each box. If all you wanted were the maximums, this is all you'd need.
  • i2 is the index of the column in the box that holds the maximum value.
  • Now we need to get the index of the row in the box... that's trickier. i3r contains the row index of the max of each column in the box, but we want the row for the specific column that's specified in i2. We do this by choosing an element from i3r using i2, which gives us i3.
  • At this point, i2 and i3 are 8 by 8 arrays containing the row and column indexes of the maximums relative to each box. We want the absolute indexes. So we create i2x and i3x (actually, this is pointless; we could just create one, as they are the same), which are just arrays of what the indexes for i2 and i3 are (0,1,2,...,8 etc in one dimension, and so on). We then multiply these by the box sizes, and add the relative max indexes, to get the absolute max indexes.
  • We then combine these to get the same output that you had. Note that if you keep them as arrays, though, instead of making tuples, it's much faster.
Sign up to request clarification or add additional context in comments.

7 Comments

Note that you don't need backslashes in the call to ast since it's surrounded by brackets.
True - for some reason I tend to obsessively place backslashes everywhere, perhaps worrying that if I don't, I'll forget when I switch to another language that needs them.
Wow, definitely strides stuff is the key ! I must study Numpy more thoroughly. Meanwhile, I'm trying to understand the logic of your code
, since the line " i3 = i2.choose(np.rollaxis(i3r,-1)) " reports: ValueError: Need between 2 and (32) array objects (inclusive).
Bah - I should have known that undocumented limitation would come up. See stackoverflow.com/q/28931900/599265 for more info on that point. I'll try to get that the NPY_MAXARGS limitation on np.choose documented, because it's rather annoying to have it suddenly appear. In any case, the modified version is more confusing, but should work.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.