0

My question is probably very simple but I can't figure out a way to make this operation faster

  print a[(b==c[i]) for i in arange(0,len(c))]

where a,b and c are three numpy arrays. I'm dealing with arrays with millions of entry and the piece of code above is the bottleneck of my program.

2
  • 1
    To answer this with anything better than a guess we'd need at least the shape of a,b,c -- vectors, matrices, etc. Commented Mar 29, 2013 at 17:41
  • 1
    Your code results in a syntax error. Could you show a small working example of the code that is slow? Commented Mar 29, 2013 at 19:46

3 Answers 3

4

Are you trying to get the values of a where b==c?

If so, you can just do a[b==c]:

from numpy import *

a = arange(11)
b = 11*a
c = b[::-1]

print a        # [  0   1   2   3   4   5   6   7   8   9  10]
print b        # [  0  11  22  33  44  55  66  77  88  99 110]
print c        # [110  99  88  77  66  55  44  33  22  11   0]
print a[b==c]  # [5]
Sign up to request clarification or add additional context in comments.

4 Comments

thanks for the answer but that's not exactly what I'm looking for. In your example I would like the result to be [10 9 8 7 6 5 4 3 2 1 0] because these are the values of a where b=c
@Matteo: What if b[j]==c[i] for multiple values of i or j?
Let's assume there are no repetitions. Sorry, my question wasn't very detailed.
@Matteo: then edit your question and say exactly what are you trying to do, besides showing us the code.
2

You should probably look into broadcasting. I assume you are looking for something like the following?

>>> b=np.arange(5)
>>> c=np.arange(6).reshape(-1,1)
>>> b
array([0, 1, 2, 3, 4])
>>> c
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])
>>> b==c
array([[ True, False, False, False, False],
       [False,  True, False, False, False],
       [False, False,  True, False, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True],
       [False, False, False, False, False]], dtype=bool)
>>> np.any(b==c,axis=1)
array([ True,  True,  True,  True,  True, False], dtype=bool)

Well for large arrays you can try:

import timeit

s="""
import numpy as np
array_size=500
a=np.random.randint(500, size=(array_size))
b=np.random.randint(500, size=(array_size))
c=np.random.randint(500, size=(array_size))
"""

ex1="""
a[np.any(b==c.reshape(-1,1),axis=0)]
"""

ex2="""
a[np.in1d(b,c)]
"""

print 'Example 1 took',timeit.timeit(ex1,setup=s,number=100),'seconds.'
print 'Example 2 took',timeit.timeit(ex2,setup=s,number=100),'seconds.'

When array_size is 50:

Example 1 took 0.00323104858398 seconds.
Example 2 took 0.0125901699066 seconds.

When array_size is 500:

Example 1 took 0.142632007599 seconds.
Example 2 took 0.0283041000366 seconds.

When array_size is 5,000:

Example 1 took 16.2110910416 seconds.
Example 2 took 0.170011043549 seconds.

When array_size is 50,000 (number=5):

Example 1 took 33.0327301025 seconds.
Example 2 took 0.0996031761169 seconds.

Note I had to change which axis for np.any() so the results would be the same. Reverse order of np.in1d or switch axis of np.any for desired effect. You can take reshape out of example 1, but reshape is really quite fast. Switch to obtain the desired effect. Really interesting- I will have to use this in the future.

6 Comments

thanks this is what I'm looking for but it's still rather slow for very large arrays
@Matteo: For a 1 million size array of ints, b==c is 1 trillion bytes, so it will probably be a bit slow. (Not to put down this answer, though, kudos and +1 to Ophion for correctly guessing what you're looking for!)
Added a faster method. @tom10 I was 90% sure what he wanted was your solution after you posted it :).
@Ophion: Great, in1d seems to be exactly the right thing. It's odd that 50,000 takes half the time of 5,000 though?
@tom10 For the 50,000 I had reduced the number of trials by a factor of 20.
|
0

How about np.where() :

>>> a  = np.array([2,4,8,16])
>>> b  = np.array([0,0,0,0])
>>> c  = np.array([1,0,0,1])
>>> bc = np.where(b==c)[0] #indices where b == c
>>> a[bc]
array([4,8])

This should do the trick. Not sure if the timing is optimal for your purposes

>>> a = np.random.randint(0,10000,1000000)
>>> b = np.random.randint(0,10000,1000000)
>>> c = np.random.randint(0,10000,1000000)
>>> %timeit( a[ np.where( b == c )[0] ]   )
100 loops, best of 3: 11.3 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.