How to make real parallel programming in Python?

Question

I want to do parallel processing to speed up the task in Python.

I used apply_async but the cpu only consumes 30%. How to fully utilize the cpu?

Below is my code.

import numpy as np
import pandas as pd
import multiprocessing

def calc_score(df, i, j, score):
    score[i,j] = df.loc[i, 'data'] + df.loc[j, 'data']

if __name__ == '__main__':
    df = pd.read_csv('data.csv')
    score = np.zeros([100, 100])
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    for i in range(100):
        for j in range(100):
            pool.apply_async(calc_score, (df, i, j, score))
    pool.close()
    pool.join()

Thank you very much.

You can't avoid the GIL in Python. Consider switching to a genuinely multi-thread capable programming language, e.g. Go. — Basile Starynkevitch
– Basile Starynkevitch, Commented May 18, 2018 at 4:49
Spend several months learning another programming language, more suited to your needs. Python (or any single other language) is not a panacea. BTW, parallel program design is always difficult — Basile Starynkevitch
– Basile Starynkevitch, Commented May 18, 2018 at 4:52

Jay · Accepted Answer · 2018-05-18 05:00:30Z

1

You can't utilize 100% CPU with pool = multiprocessing.Pool(multiprocessing.cpu_count()) . It starts your worker function on the number of core given by you but also looks for a free core. If you want to utilize maximum CPU with multiprocessing you should use multiprocessing Process class. It keeps spinning new thread. But be aware it will breakdown system if your CPU doesn't have memory to spin new thread.

answered May 18, 2018 at 5:00

Jay

111 bronze badge

Sign up to request clarification or add additional context in comments.

3 Comments

John Over a year ago

I used multiprocessing Process class but the CPU utilization is still 30%.

John Over a year ago

Any other methods?

Jay Over a year ago

Hi, can you share your code what exactly you are trying to do? You can't go for 100% CPU utilization if your task contains a lot of IO bound, in this case, OS will make CPU idle.

zvone · Accepted Answer · 2018-05-18 16:08:43Z

"CPU utilization" should be about performance, i.e. you want to do the job in as little time as possible. There is no generic way to do that. If there was a generic way to optimize software, then there would be no slow software, right?

You seem to be looking for a different thing: spend as much CPU time as possible, so that it does not sit idly. That may seem like the same thing, but is absolutely not.

Anyway, if you want to spend 100% of CPU time, this script will do that for you:

import time
import multiprocessing

def loop_until_t(t):
    while time.time() < t:
        pass

def waste_cpu_for_n_seconds(num_seconds, num_processes=multiprocessing.cpu_count()):
    t0 = time.time()
    t = t0 + num_seconds
    print("Begin spending CPU time (in {} processes)...".format(num_processes))
    with multiprocessing.Pool(num_processes) as pool:
        pool.map(loop_until_t, num_processes*[t])
    print("Done.")

if __name__ == '__main__':
    waste_cpu_for_n_seconds(15)

If, instead, you want your program to run faster, you will not do that with an "illustration for parallel processing", as you call it - you need an actual problem to be solved.

Collectives™ on Stack Overflow

How to make real parallel programming in Python?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related