Python multithreading but using object instance

Question

I hope you can help me.

I have a msgList, containing msg objects, each one having the pos and content attributes. Then I have a function posClassify, that creates a SentimentClassifier object, that iterates thru this msgList and does msgList[i].pos = clf.predict(msgList[i].content), being clf an instance of SentimentClassifier.

def posClassify(msgList):
    clf = SentimentClassifier()
    for i in tqdm(range(len(msgList))):
        if msgList[i].content.find("omitted") == -1:
            msgList[i].pos = clf.predict(msgList[i].content)

And what I wanted is to compute this using multiprocessing. I have read that you create a pool, and call a function with a list of the arguments you want to pass this function, and thats it. I imagine that that function must be something like saving an image or working on different memory spaces, and not like mine, where you want to modify that same msg object, and also, having to use that SentimentClassifier object (which takes about 10 seconds or so to initialize).

My thoughts where creating cpu_cores-1 processes, each one using an instance of SentimentClassifier, and then each process starts consuming that msg list with its own classifier, but I can't work out how to approach this. I also thought of creating threads with binary semaphores, each one calling its own classifier, and then waiting the semaphore to update the pos value in the msg object, but still cant figure it out.

Have you actually tried what you think you should be doing? What problems did you encounter? — MisterMiyagi
– MisterMiyagi, Commented Feb 21, 2020 at 8:30

halfer · Accepted Answer · 2021-06-23 09:30:55Z

1

You can use ProcessPoolExecutor from futures module in Python.

The `ProcessPoolExecutor` is

An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine

You can find more at Python docs.

Here, is the sample code of achieving the concurrency assuming that each msgList[i] is independent of msgList[j] when i != j,

from concurrent import futures

def posClassify(msg, idx, clf):
    return idx, clf.predict(msg.content)

def classify(msgList):
    clf = SentimentClassifier()

    calls = []

    executor = futures.ProcessPoolExecutor(max_workers=4)
    for i in tqdm(range(len(msgList))):
        if msgList[i].content.find("omitted") == -1:
            call = executor.submit(posClassify, msgList[i], i, clf)
            calls.append(call)

    # wait for all processes to finish
    executor.shutdown()

    # assign the result of individual calls to msgList[i].pos
    for call in calls:
        result = call.result()
        msgList[result[0]].pos = result[1]

In order to execute the code, just call the classify(msgList) function.

edited Jun 23, 2021 at 9:30

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Feb 21, 2020 at 8:18

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

MisterMiyagi Over a year ago

Since msg in the subprocess is likely a copy (unless explicitly made multiprocess aware), the results don't affect the original msgList in the main process. It will discard all results.

Francisco Aguilera Over a year ago

The following error is thrown: TypeError: posClassify() missing 1 required positional argument: 'idx'

Francisco Aguilera Over a year ago

Thank you for the answer! but, the problem comes with clf = SentimentClassifier, that takes around 10 seconds each time its instanced. The solution would be creating 4 (n of processes) of this clf, and then each process work with its own instance of the classifier, but I cant manage how to reach this solution, as I dont know how to work correctly with processes and those instances.

Shubham Sharma Over a year ago

Well in that case you could just create one instance of classifier and pass it to the posClassify method. What do you think?

Shubham Sharma Over a year ago

What's the average running time of clf.predict method?

|

Collectives™ on Stack Overflow

Python multithreading but using object instance

1 Answer 1

The `ProcessPoolExecutor` is

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

The ProcessPoolExecutor is

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related

The `ProcessPoolExecutor` is