1

I want to split a numpy array into three different arrays based on a logical comparison. The numpy array I want to split is called x. It's shape looks as follows, but it's entries vary: (In response to Saullo Castro's comment I included a slightly different array x.)

array([[ 0.46006547,  0.5580928 ,  0.70164242,  0.84519205,  1.4       ],
      [ 0.00912908,  0.00912908,  0.05      ,  0.05      ,  0.05      ]])

This values of this array are monotonically increasing along columns. I also have two other arrays called lowest_gridpoints and highest_gridpoints. The entries of these arrays also vary, but the shape is always identical to the following:

 array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

The selection procedure I want to apply is as follows:

  • All columns containing values lower than any value in lowest_gridpoints should be removed from x and constitute the array temp1.
  • All columns containing values higher than any value in highest_gridpoints should be removed from x and constitute the array temp2.
  • All columns of x that are included in neither temp1 or temp2 constitute the array x_new.

The following code I wrote achieves the task.

if np.any( x[:,-1] > highest_gridpoints ) or np.any( x[:,0] < lowest_gridpoints ):
    for idx, sample, in enumerate(x.T):
        if np.any( sample > highest_gridpoints):
            max_idx = idx
            break
        elif np.any( sample < lowest_gridpoints ):
            min_idx = idx 
    temp1, temp2 = np.array([[],[]]), np.array([[],[]])
    if 'min_idx' in locals():
        temp1 = x[:,0:min_idx+1]
    if 'max_idx' in locals():
        temp2 = x[:,max_idx:]
    if 'min_idx' in locals() or 'max_idx' in locals():
        if 'min_idx' not in locals():
            min_idx = -1
        if 'max_idx' not in locals():
            max_idx = x.shape[1]
        x_new = x[:,min_idx+1:max_idx]

However, I suspect that this code is very inefficient because of the heavy use of loops. Additionally, I think the syntax is bloated.

Does someone have an idea for a code which achieve the task outlined above more efficiently or looks concise?

5
  • your example returns [] for me... it would be nice to have a different input that can be used for comparisons... Commented Oct 20, 2014 at 10:27
  • 1
    @SaulloCastro: Thank you for your comment. I slightly modified the array x. Do you have an idea on how to modify my code? Commented Oct 20, 2014 at 14:30
  • 1
    Do you expect temp1 and temp2 to be mutually exclusive or can it happen that a column has both a value lower than the one in lowest_gridpoints and another value higher than the one in highest_gridpoints? Also, did you mean monotonically increasing along the rows? Commented Oct 20, 2014 at 16:50
  • Maybe you can use np.argsort(x[i] + [lowest_gridpoints[i]])[-1]. This will give you the index of the first element larger than lowest_gridpoints[i]. Do it for all i and get the maximum (minimum for the highest_gridpoints) Commented Oct 20, 2014 at 16:58
  • @greschd: That's a good point. I want temp1 and temp2 to be mutually exclusive. In my code, this is guaranteed by the break command after ` if np.any( sample > highest_gridpoints): In doubt, I classify columns of x` to para2 instead of para1. I meant monotonically increasing along the second dimension of np.arrays, so that x[0,i] >= x[0,j] for i > j. I hope (and think) this refers to columns. Commented Oct 20, 2014 at 16:59

1 Answer 1

1

Only the first part of your question

from numpy import *

x = array([[ 0.46006547,  0.5580928 ,  0.70164242,  0.84519205,  1.4       ],
           [ 0.00912908,  0.00912908,  0.05      ,  0.05      ,  0.05      ]])

low, high = array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

# construct an array of two rows of bools expressing your conditions
indices1 = array((x[0,:]<low[0], x[1,:]<low[1]))
print indices1

# do an or of the values along the first axis
indices1 = any(indices1, axis=0)
# now it's a single row array
print indices1

# use the indices1 to extract what you want,
# the double transposition because the elements
# of a 2d array are  the rows
tmp1 = x.T[indices1].T
print tmp1

# [[ True  True False False False]
#  [ True  True False False False]]
# [ True  True False False False]
# [[ 0.46006547  0.5580928 ]
#  [ 0.00912908  0.00912908]]

next construct similarly indices2 and tmp2, the indices of the remnant are the negation of the oring of the first two indices. (i.e., numpy.logical_not(numpy.logical_or(i1,i2))).

Addendum

Another approach, possibly faster if you have thousands of entries, implies numpy.searchsorted

from numpy import *

x = array([[ 0.46006547,  0.5580928 ,  0.70164242,  0.84519205,  1.4       ],
           [ 0.00912908,  0.00912908,  0.05      ,  0.05      ,  0.05      ]])

low, high = array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

l0r = searchsorted(x[0,:], low[0], side='right')
l1r = searchsorted(x[1,:], low[1], side='right')

h0l = searchsorted(x[0,:], high[0], side='left')
h1l = searchsorted(x[1,:], high[1], side='left')

lr = max(l0r, l1r)
hl = min(h0l, h1l)

print lr, hl
print x[:,:lr]
print x[:,lr:hl]
print x[:,hl]

# 2 4
# [[ 0.46006547  0.5580928 ]
#  [ 0.00912908  0.00912908]]
# [[ 0.70164242  0.84519205]
#  [ 0.05        0.05      ]]
# [ 1.4   0.05]

Excluding overlaps can be obtained by hl = max(lr, hl). NB in previuos approach the array slices are copied to new objects, here you get views on x and you have to be explicit if you want new objects.

Edit An unnecessary optimization

If we use only the upper part of x in the second couple of sortedsearches (if you look at the code you'll see what I mean...) we get two benefits, 1) a very small speedup of the searches (sortedsearch is always fast enough) and 2) the case of overlap is automatically managed.

As a bonus, code for copying the segments of x in the new arrays. NB x was changed to force overlap

from numpy import *

# I changed x to force overlap
x = array([[ 0.46006547,  1.4 ,        1.4,   1.4,  1.4       ],
           [ 0.00912908,  0.00912908,  0.05,  0.05, 0.05      ]])

low, high = array([ 0.633,  0.01 ]), array([ 1.325,  0.99 ])

l0r = searchsorted(x[0,:], low[0], side='right')
l1r = searchsorted(x[1,:], low[1], side='right')
lr = max(l0r, l1r)

h0l = searchsorted(x[0,lr:], high[0], side='left')
h1l = searchsorted(x[1,lr:], high[1], side='left')

hl = min(h0l, h1l) + lr

t1 = x[:,range(lr)]
xn = x[:,range(lr,hl)]
ncol = shape(x)[1]
t2 = x[:,range(hl,ncol)]

print x
del(x)
print
print t1
print
# note that xn is a void array 
print xn
print
print t2

# [[ 0.46006547  1.4         1.4         1.4         1.4       ]
#  [ 0.00912908  0.00912908  0.05        0.05        0.05      ]]
# 
# [[ 0.46006547  1.4       ]
#  [ 0.00912908  0.00912908]]
# 
# []
# 
# [[ 1.4   1.4   1.4 ]
#  [ 0.05  0.05  0.05]]
Sign up to request clarification or add additional context in comments.

2 Comments

I'm beginning to fear that I've not understood the OP requirements.
Thanks for your answer; your Appendum worked very well for me, except for one modification: To avoid overlaps I had to use if hl < lr: hl = hl + lr

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.