Assume three numpy arrays x, y and z
z = (x**2)/ y for each x > 2 y
z = (x**2)/y**(3/2) for each x > 3 y
z = (1/x)*sin(x) for each x > 4 y
The array x, y and z are of-course made up but they illustrate the point of operating multiple if statements on multiple arrays. The arrays x, y and z are about 500,000 elements each.
One possible way (much like FORTRAN) is to create a variable i to index the arrays and use it to test if x[i] > 2*y[i] or x[i] > 3*y[i]. I assume it would be slow.
I need a fast, elegant and a more pythonic way to compute the array z.
UPDATE: I have tried the two methods and here are the results:
# Fortran way of loops:
import numpy as np
x=np.random.rand(40000,1)
y=np.random.rand(40000,1)
z = np.zeros(x.shape)
for i, v in enumerate(x):
#print i
if x[i] >2*y[i]:
z[i]= x[i]**2/y[i]
if x[i] > 3*y[i]:
z[i]=x[i]**2/y[i]**(1.5)
if x[i] > 4*y[i]:
z[i] = (1/x[i])*np.sin(x[i])
z = np.zeros(x.shape)
print z
#end----
The timing results are as follows:
real 0m0.920s
user 0m0.900s
sys 0m0.016s
The other piece of code used is:
# Pythonic way
import numpy as np
x=np.random.rand(40000,1)
y=np.random.rand(40000,1)
indices1 = np.where(x > 2*y)
indices2 = np.where(x > 3*y)
indices3 = np.where(x > 4*y)
z = np.zeros(x.shape)
z[indices1] = x[indices1]**2/y[indices1]
z[indices2] = x[indices2]**2/y[indices2]**(1.5)
z[indices3] = (1/x[indices3])*np.sin(x[indices3])
print z
# end of code -----
The timing results are as follows:
real 0m0.110s
user 0m0.076s
sys 0m0.028s
So there is a large difference in the execution times. The two pieces were run on a ubuntu virtual machine with python 2.7.5
UPDATE: I did another test using
indices1 = x > 2*y
indices2 = x > 3*y
indices3 = x > 4*y
The timing results were:
real 0m0.105s
user 0m0.084s
sys 0m0.016s
SUMMARY: Method 3 is the most elegant and slightly faster than using np.where. Using explicit loops is very slow.