Efficiently select subsection of numpy array

Question

I want to split a numpy array into three different arrays based on a logical comparison. The numpy array I want to split is called x. It's shape looks as follows, but it's entries vary: (In response to Saullo Castro's comment I included a slightly different array x.)

array([[ 0.46006547,  0.5580928 ,  0.70164242,  0.84519205,  1.4       ],
      [ 0.00912908,  0.00912908,  0.05      ,  0.05      ,  0.05      ]])

This values of this array are monotonically increasing along columns. I also have two other arrays called lowest_gridpoints and highest_gridpoints. The entries of these arrays also vary, but the shape is always identical to the following:

 array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

The selection procedure I want to apply is as follows:

All columns containing values lower than any value in lowest_gridpoints should be removed from x and constitute the array temp1.
All columns containing values higher than any value in highest_gridpoints should be removed from x and constitute the array temp2.
All columns of x that are included in neither temp1 or temp2 constitute the array x_new.

The following code I wrote achieves the task.

if np.any( x[:,-1] > highest_gridpoints ) or np.any( x[:,0] < lowest_gridpoints ):
    for idx, sample, in enumerate(x.T):
        if np.any( sample > highest_gridpoints):
            max_idx = idx
            break
        elif np.any( sample < lowest_gridpoints ):
            min_idx = idx 
    temp1, temp2 = np.array([[],[]]), np.array([[],[]])
    if 'min_idx' in locals():
        temp1 = x[:,0:min_idx+1]
    if 'max_idx' in locals():
        temp2 = x[:,max_idx:]
    if 'min_idx' in locals() or 'max_idx' in locals():
        if 'min_idx' not in locals():
            min_idx = -1
        if 'max_idx' not in locals():
            max_idx = x.shape[1]
        x_new = x[:,min_idx+1:max_idx]

However, I suspect that this code is very inefficient because of the heavy use of loops. Additionally, I think the syntax is bloated.

Does someone have an idea for a code which achieve the task outlined above more efficiently or looks concise?

your example returns [] for me... it would be nice to have a different input that can be used for comparisons... — Saullo G. P. Castro
– Saullo G. P. Castro, Commented Oct 20, 2014 at 10:27
@SaulloCastro: Thank you for your comment. I slightly modified the array x. Do you have an idea on how to modify my code? — fabian
– fabian, Commented Oct 20, 2014 at 14:30
Do you expect temp1 and temp2 to be mutually exclusive or can it happen that a column has both a value lower than the one in lowest_gridpoints and another value higher than the one in highest_gridpoints? Also, did you mean monotonically increasing along the rows? — greschd
– greschd, Commented Oct 20, 2014 at 16:50
Maybe you can use np.argsort(x[i] + [lowest_gridpoints[i]])[-1]. This will give you the index of the first element larger than lowest_gridpoints[i]. Do it for all i and get the maximum (minimum for the highest_gridpoints) — greschd
– greschd, Commented Oct 20, 2014 at 16:58
@greschd: That's a good point. I want temp1 and temp2 to be mutually exclusive. In my code, this is guaranteed by the break command after ` if np.any( sample > highest_gridpoints): In doubt, I classify columns of x` to para2 instead of para1. I meant monotonically increasing along the second dimension of np.arrays, so that x[0,i] >= x[0,j] for i > j. I hope (and think) this refers to columns. — fabian
– fabian, Commented Oct 20, 2014 at 16:59

gboffi · Accepted Answer · 2014-10-21 16:52:48Z

Only the first part of your question

from numpy import *

x = array([[ 0.46006547,  0.5580928 ,  0.70164242,  0.84519205,  1.4       ],
           [ 0.00912908,  0.00912908,  0.05      ,  0.05      ,  0.05      ]])

low, high = array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

# construct an array of two rows of bools expressing your conditions
indices1 = array((x[0,:]<low[0], x[1,:]<low[1]))
print indices1

# do an or of the values along the first axis
indices1 = any(indices1, axis=0)
# now it's a single row array
print indices1

# use the indices1 to extract what you want,
# the double transposition because the elements
# of a 2d array are  the rows
tmp1 = x.T[indices1].T
print tmp1

# [[ True  True False False False]
#  [ True  True False False False]]
# [ True  True False False False]
# [[ 0.46006547  0.5580928 ]
#  [ 0.00912908  0.00912908]]

next construct similarly indices2 and tmp2, the indices of the remnant are the negation of the oring of the first two indices. (i.e., numpy.logical_not(numpy.logical_or(i1,i2))).

Addendum

Another approach, possibly faster if you have thousands of entries, implies numpy.searchsorted

from numpy import *

x = array([[ 0.46006547,  0.5580928 ,  0.70164242,  0.84519205,  1.4       ],
           [ 0.00912908,  0.00912908,  0.05      ,  0.05      ,  0.05      ]])

low, high = array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

l0r = searchsorted(x[0,:], low[0], side='right')
l1r = searchsorted(x[1,:], low[1], side='right')

h0l = searchsorted(x[0,:], high[0], side='left')
h1l = searchsorted(x[1,:], high[1], side='left')

lr = max(l0r, l1r)
hl = min(h0l, h1l)

print lr, hl
print x[:,:lr]
print x[:,lr:hl]
print x[:,hl]

# 2 4
# [[ 0.46006547  0.5580928 ]
#  [ 0.00912908  0.00912908]]
# [[ 0.70164242  0.84519205]
#  [ 0.05        0.05      ]]
# [ 1.4   0.05]

Excluding overlaps can be obtained by hl = max(lr, hl). NB in previuos approach the array slices are copied to new objects, here you get views on x and you have to be explicit if you want new objects.

Edit An unnecessary optimization

If we use only the upper part of x in the second couple of sortedsearches (if you look at the code you'll see what I mean...) we get two benefits, 1) a very small speedup of the searches (sortedsearch is always fast enough) and 2) the case of overlap is automatically managed.

As a bonus, code for copying the segments of x in the new arrays. NB x was changed to force overlap

from numpy import *

# I changed x to force overlap
x = array([[ 0.46006547,  1.4 ,        1.4,   1.4,  1.4       ],
           [ 0.00912908,  0.00912908,  0.05,  0.05, 0.05      ]])

low, high = array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

l0r = searchsorted(x[0,:], low[0], side='right')
l1r = searchsorted(x[1,:], low[1], side='right')
lr = max(l0r, l1r)

h0l = searchsorted(x[0,lr:], high[0], side='left')
h1l = searchsorted(x[1,lr:], high[1], side='left')

hl = min(h0l, h1l) + lr

t1 = x[:,range(lr)]
xn = x[:,range(lr,hl)]
ncol = shape(x)[1]
t2 = x[:,range(hl,ncol)]

print x
del(x)
print
print t1
print
# note that xn is a void array 
print xn
print
print t2

# [[ 0.46006547  1.4         1.4         1.4         1.4       ]
#  [ 0.00912908  0.00912908  0.05        0.05        0.05      ]]
# 
# [[ 0.46006547  1.4       ]
#  [ 0.00912908  0.00912908]]
# 
# []
# 
# [[ 1.4   1.4   1.4 ]
#  [ 0.05  0.05  0.05]]

I'm beginning to fear that I've not understood the OP requirements.
Thanks for your answer; your Appendum worked very well for me, except for one modification: To avoid overlaps I had to use if hl < lr: hl = hl + lr

Collectives™ on Stack Overflow

Efficiently select subsection of numpy array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related