How to process a list in parallel in Python? [duplicate]

Question

I wrote code like this:

def process(data):
   #create file using data

all = ["data1", "data2", "data3"]

I want to execute process function on my all list in parallel, because they are creating small files so I am not concerned about disk write but the processing takes long, so I want to use all of my cores.

How can I do this using default modules in python 2.7?

Help you to search next time, keywords: multiprocessing python map. Then you can easily find solution from Google. — Sraw
– Sraw, Commented Aug 13, 2018 at 3:44

max · Accepted Answer · 2023-03-27 22:49:20Z

Assuming CPython and the GIL here.

If your task is I/O bound, in general, threading may be more efficient since the threads are simply dumping work on the operating system and idling until the I/O operation finishes. Spawning processes is a heavy way to babysit I/O.

However, most file systems aren't concurrent, so using multithreading or multiprocessing may not be any faster than synchronous writes.

Nonetheless, here's a contrived example of multiprocessing.Pool.map which may help with your CPU-bound work:

from multiprocessing import cpu_count, Pool

def process(data):
    # best to do heavy CPU-bound work here...

    # file write for demonstration
    with open("%s.txt" % data, "w") as f:
        f.write(data)

    # example of returning a result to the map
    return data.upper()
      
tasks = ["data1", "data2", "data3"]
pool = Pool(cpu_count() - 1)
print(pool.map(process, tasks))

A similar setup for threading can be found in concurrent.futures.ThreadPoolExecutor.

As an aside, all is a builtin function and isn't a great variable name choice.

U13-Forward · Accepted Answer · 2018-08-13 03:50:38Z

7

Or:

from threading import Thread

def process(data):
    print("processing {}".format(data))

l= ["data1", "data2", "data3"]

for task in l:
    t = Thread(target=process, args=(task,))
    t.start()

Or (only python version > 3.6.0):

from threading import Thread

def process(data):
    print(f"processing {data}")

l= ["data1", "data2", "data3"]

for task in l:
    t = Thread(target=process, args=(task,))
    t.start()

answered Aug 13, 2018 at 3:50

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Comments

pwxcoo · Accepted Answer · 2018-08-13 03:53:43Z

4

There is a template of using multiprocessing, hope helpful.

from multiprocessing.dummy import Pool as ThreadPool

def process(data):
    print("processing {}".format(data))
alldata = ["data1", "data2", "data3"]

pool = ThreadPool()

results = pool.map(process, alldata)

pool.close()
pool.join()

edited Aug 13, 2018 at 3:53

answered Aug 13, 2018 at 3:42

pwxcoo

3,2932 gold badges19 silver badges21 bronze badges

1 Comment

Sraw Over a year ago

OP says "I want to use all of my cores". ThreadPool won't use more than one core.

Collectives™ on Stack Overflow

How to process a list in parallel in Python? [duplicate]

3 Answers 3

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Linked

Related