Main goal: Suppose I have a multi-dimensional array. I also have a 0-1 index set corresponding to each column for each row. For example, If my array is [[3,6,7,8], [1,32,45,7]], I will have an index set as [[1,0,1,1], [0,0,1,1]]. I would like to take a copy of each row of my array n times. Then, I'd like to increase each element whose corresponding index is equal to 1 randomly.
import time
import random
import numpy as np
def foo(arr, upper_bound, index_set, first_set_size, sec_set_size, limit):
iter =0
my_array = np.zeros((first_set_size*sec_set_size, limit)) #each row is copied |sec_set_size| times
it =0
for i in range(first_set_size):
for j in range(sec_set_size):
my_array[it] = arr[i] #copy the elements from the corresponding row
for k in range(limit):
if index_set[i][k]==1: #update the elements whose indices are one
temp = arr[i][k] #get the current value
my_array[it][k] =temp + random.randint(1,upper_bound-temp) #I use fastrand.pcg32bounded here. Update the value.
it +=1
return my_array
upper_bound = 50
limit = 1000
first_set_size= 100
sec_set_size = 50
arr = np.random.randint(25, size=(first_set_size, limit)) #create an array containing integer numbers
index_set= np.array([[random.randint(0,1) for j in range(limit)] for i in range(first_set_size)]) #each elements has an index which is either 1 or 0
start_time = time.time() #measure the time taken by the function
result = foo(arr, upper_bound,index_set, first_set_size, sec_set_size, limit)
print("time taken: %s " % (time.time() - start_time))
Once I increase the limit and set sizes, the code takes several minutes. Is there any way that I can perform this operation faster / efficiently? I've spent quite a bit of time on this, but could not improve the speed of my implementation.
EDIT: Suppose my initial array is:
[[11 23 24 17 0]
[ 1 23 12 19 5]
[20 15 1 17 17]
[ 3 8 7 0 24]]
Also, my index set is given as;
[[1 0 0 0 1]
[1 0 1 0 0]
[1 1 1 1 0]
[0 1 0 1 1]]
If sec_set_size=5, I would like to take the copy of each row and increase the values of each element if their indices are one.
The final result should be like this;
[[39. 23. 24. 17. 44.]
[50. 23. 24. 17. 27.]
[42. 23. 24. 17. 24.]
[45. 23. 24. 17. 11.]
[49. 23. 24. 17. 43.]
[23. 23. 44. 19. 5.]
[10. 23. 37. 19. 5.]
[14. 23. 29. 19. 5.]
[12. 23. 22. 19. 5.]
[ 5. 23. 15. 19. 5.]
[36. 45. 26. 37. 17.]
[24. 40. 35. 38. 17.]
[34. 20. 24. 31. 17.]
[27. 16. 9. 20. 17.]
[37. 37. 6. 37. 17.]
[ 3. 50. 7. 46. 47.]
[ 3. 13. 7. 37. 44.]
[ 3. 23. 7. 32. 29.]
[ 3. 10. 7. 22. 41.]
[ 3. 22. 7. 32. 41.]]
iteras a variable name, it's a built-in function.