I am learning Multiprocessing and Threading in python to process and create large amount of files, the diagram is shown here diagram
Each of output file depends on the analysis of all input files.
Single processing of the program takes quite a long time, so I tried the following codes:
(a) multiprocessing
start = time.time()
process_count = cpu_count()
p = Pool(process_count)
for i in range(process_count):
p.apply_async(my_read_process_and_write_func, args=(i,w))
p.close()
p.join()
end = time.time()
(b) threading
start = time.time()
thread_count = cpu_count()
thread_list = []
for i in range(0, thread_count):
t = threading.Thread(target=my_read_process_and_write_func, args=(i,))
thread_list.append(t)
for t in thread_list:
t.start()
for t in thread_list:
t.join()
end = time.time()
I am runing these codes using Python 3.6 on a Windows PC with 8 cores. However Multiprocessing method takes about the same time as the single-processing method, and Threading method takes about 75% of the single-processing method.
My questions are:
Are my codes correct?
Is there any better way/codes to improve the efficiency? Thanks!
