I have the following function:
def function(arguments, process_number):
calclulations that build vector_1 and vector_2 of size 100000 each
return vector_1, vector_2
I need to run this function in parallel, for 200 different values of process_number. I am not interested in having an output ordered by process_number, so I am using apply_async(), in the following fashion:
def parallelizing():
number_of_process
pool = mp.Pool(mp.cpu_count())
size = 100000
vector_1 = np.zeros(size)
vector_2 = np.zeros(size)
returned_object = [pool.apply_async(function,args=(args, procnum)) for procnum in range(number_of_process)]
pool.close()
pool.join()
for r in returned_object:
vector_1 += r.get()[0]
vector_2 += r.get()[1]
So at the end of the day, I just have to add together all the vectors that I get as results from the function I am parallelizing.
The problem is that a big portion of the time used by the process is actually storing memory to build the [returned_object] list, which I think it is really not necessary as I simply need the return of the function to be added and then forgotten.
So I am trying to avoid building a huge list of objects, each containing at least two vecotrs of floats of size 100000, and get directly into the "addition" step.
Is there a way? Should I define a global variable and write on it inside the function? I fear it might lead to concurrency and screw things up. As I said, I really don't care about getting an ordered result, given that I simply need to add things up.
Edit from Cireo answer down below:
Ok just to confirm if I understood. Instead of using the apply_async method, I would do something like the following:
def function(args, queue_1, queue_2):
#do calculation
queue.put(vector_1)
queue.put(vector_2)
and inside the function that calls the parallelization I simply do
def parallelizing():
queue1 = Queue()
queue2=Queue()
processid = np.arange(200)
p = Process(target=f, args=(args,queue1,queue2,prcoessid))
p.start()
p.join()
What I don't really understand is how, instead of putting things in a Queue which to me seems as computationally intensive as creating a list, I can instead add the returns of my functions. if I do Queue.put() don't I end up with the same list as before?
pool = mp.Pool(60)and check.