Numpy: get array where index greater than value and condition is true

Question

I have the following array:

a = np.array([6,5,4,3,4,5,6])

Now I want to get all elements which are greater than 4 but also have in index value greater than 2. The way that I have found to do that was the following:

a[2:][a[2:]>4]

Is there a better or more readable way to accomplish this?

UPDATE: This is a simplified version. In reality the indexing is done with arithmetic operation over several variables like this:

a[len(trainPredict)+(look_back*2)+1:][a[len(trainPredict)+(look_back*2)+1:]>4]

trainPredict ist a numpy array, look_back an integer.
I wanted to see if there is an established way or how others do that.

Are you looking for the elements, the indices of the elements (in the original array, presumably), or a mask for the elements? — Mad Physicist
– Mad Physicist, Commented Oct 9, 2019 at 18:29
@MadPhysicist I am looking for the elements on part of the array as shown in the sample: a[2:][a[2:]>4] — Code Pope
– Code Pope, Commented Oct 9, 2019 at 23:08
You should select the posted answer. It's about as concise and accurate as you can be. — Mad Physicist
– Mad Physicist, Commented Oct 10, 2019 at 0:18
@MadPhysicist it is the same as that I have written in the question: a[2:][a[2:]>4], just in three lines instead of one. If there is no other way, then I will have my answer and will select it. — Code Pope
– Code Pope, Commented Oct 10, 2019 at 0:35
The other ways I can think of are all much less efficient. I'll write an answer to prove it. The existing answer is a much cleaner way than the one-liner because it avoids redundant temp arrays. — Mad Physicist
– Mad Physicist, Commented Oct 10, 2019 at 2:37

AMC · Accepted Answer · 2019-10-09 18:23:58Z

2

If you're worried about the complexity of the slice and/or the number of conditions, you can always separate them:

a = np.array([6,5,4,3,4,5,6])

a_slice = a[2:]

cond_1 = a_slice > 4

res = a_slice[cond_1]

Is your example very simplified? There might be better solutions for more complex manipulations.

answered Oct 9, 2019 at 18:23

AMC

2,6977 gold badges15 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mad Physicist · Accepted Answer · 2019-10-10 03:11:27Z

1

@AlexanderCécile's answer is not only more legible than the one liner you posted, but is also removes the redundant computation of a temp array. Despite that, it does not appear to be any faster than your original approach.

The timings below are all run with a preliminary setup of

import numpy as np
np.random.seed(0xDEADBEEF)
a = np.random.randint(8, size=N)

N varies from 1e3 to 1e8 in factors of 10. I tried four variants of the code:

CodePope: result = a[2:][a[2:] > 4]
AlexanderCécile: s = a[2:]; result = s[s > 4]
MadPhysicist1: result = a[np.flatnonzero(a[2:]) + 2]
MadPhysicist2: result = a[(a > 4) & (np.arange(a.size) >= 2)]

In all cases, the timing was obtained on the command line by running

python -m timeit -s 'import numpy as np; np.random.seed(0xDEADBEEF); a = np.random.randint(8, size=N)' '<X>'

Here, N was a power of 10 between 3 and 8, and <X> one of the expressions above. Timings are as follows:

Methods #1 and #2 are virtually indistinguishable. What is surprising is that in the range between ~5e3 and ~1e6 elements, method #3 seems to be slightly, but noticeably faster. I would not normally expect that from fancy indexing. Method #4 is of course going to be the slowest.

Here is the data, for completeness:

           CodePope  AlexanderCécile  MadPhysicist1  MadPhysicist2
1000       3.77e-06         3.69e-06       5.48e-06       6.52e-06
10000       4.6e-05         4.59e-05       3.97e-05       5.93e-05
100000     0.000484         0.000483         0.0004       0.000592
1000000     0.00513          0.00515        0.00503        0.00675
10000000     0.0529           0.0525         0.0617          0.102
100000000     0.657            0.658          0.782           1.09

answered Oct 10, 2019 at 3:11

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

2 Comments

AMC Over a year ago

Indeed, my answer only improves legibility. Since numpy array slices are views, the overheard of creating a new variable probably outweighs the small performance gain from not slicing twice.

AMC Over a year ago

Edit: In his updated code, however, separating the parts as in my answer may lead to an increase in performance since the numerical operations create new arrays.

Collectives™ on Stack Overflow

Numpy: get array where index greater than value and condition is true

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related