0

Assume three numpy arrays x, y and z

    z = (x**2)/ y          for each  x > 2 y
    z = (x**2)/y**(3/2)    for each  x > 3 y
    z = (1/x)*sin(x)       for each  x > 4 y

The array x, y and z are of-course made up but they illustrate the point of operating multiple if statements on multiple arrays. The arrays x, y and z are about 500,000 elements each.

One possible way (much like FORTRAN) is to create a variable i to index the arrays and use it to test if x[i] > 2*y[i] or x[i] > 3*y[i]. I assume it would be slow.

I need a fast, elegant and a more pythonic way to compute the array z.

UPDATE: I have tried the two methods and here are the results:

   # Fortran way of loops: 
   import numpy as np

   x=np.random.rand(40000,1)
   y=np.random.rand(40000,1)

   z = np.zeros(x.shape)
   for i, v in enumerate(x):
        #print i
        if x[i] >2*y[i]:
            z[i]= x[i]**2/y[i]
        if x[i] > 3*y[i]:
            z[i]=x[i]**2/y[i]**(1.5)
        if x[i] > 4*y[i]:
            z[i] = (1/x[i])*np.sin(x[i])

    z = np.zeros(x.shape)
    print z
    #end----

The timing results are as follows:

    real    0m0.920s
    user    0m0.900s
     sys    0m0.016s

The other piece of code used is:

    # Pythonic way
    import numpy as np

    x=np.random.rand(40000,1)
    y=np.random.rand(40000,1)

    indices1 = np.where(x > 2*y)
    indices2 = np.where(x > 3*y)
    indices3 = np.where(x > 4*y)

    z = np.zeros(x.shape)
    z[indices1] = x[indices1]**2/y[indices1]
    z[indices2] = x[indices2]**2/y[indices2]**(1.5)
    z[indices3] = (1/x[indices3])*np.sin(x[indices3]) 
    print z
    # end of code -----

The timing results are as follows:

    real    0m0.110s
    user    0m0.076s
     sys    0m0.028s

So there is a large difference in the execution times. The two pieces were run on a ubuntu virtual machine with python 2.7.5

UPDATE: I did another test using

    indices1 = x > 2*y
    indices2 = x > 3*y
    indices3 = x > 4*y

The timing results were:

     real   0m0.105s
     user   0m0.084s
      sys   0m0.016s

SUMMARY: Method 3 is the most elegant and slightly faster than using np.where. Using explicit loops is very slow.

3
  • Have you tried to benchmark your idea? Show us what you have done. Commented Jan 24, 2015 at 5:11
  • 1
    Did you try using boolean indexing? Commented Jan 24, 2015 at 5:15
  • I have updated my answer by benchmarking Commented Jan 24, 2015 at 17:23

1 Answer 1

2

I'm not quite sure if you are looking to have your z array be the same size as x or y, but I will assume so.

Numpy has a function that can find the indices of elements based on a condition. In the example below I am doing a calculation similar to what your first line does.

import numpy as np

x = np.arange(4)
x[2:] += 10
print x

y = np.arange(4)
print y

indices = np.where(x > 2*y)
print indices

z = np.zeros(x.shape)
z[indices] = x[indices]**2/y[indices]
print z

The print statements yield the following:

x: [0 1 12 13]

y: [0 1 2 3]

indices: [2, 3]

z: [0 0 72 56]

Edit: Upon further testing it turns out that you don't even need to use the numpy where function. You can simply set indices = x > 2*y.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.