0

I want to do parallel processing to speed up the task in Python.

I used apply_async but the cpu only consumes 30%. How to fully utilize the cpu?

Below is my code.

import numpy as np
import pandas as pd
import multiprocessing

def calc_score(df, i, j, score):
    score[i,j] = df.loc[i, 'data'] + df.loc[j, 'data']

if __name__ == '__main__':
    df = pd.read_csv('data.csv')
    score = np.zeros([100, 100])
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    for i in range(100):
        for j in range(100):
            pool.apply_async(calc_score, (df, i, j, score))
    pool.close()
    pool.join()

Thank you very much.

13
  • Be aware of the GIL Commented May 18, 2018 at 4:48
  • How to avoid it? Commented May 18, 2018 at 4:49
  • You can't avoid the GIL in Python. Consider switching to a genuinely multi-thread capable programming language, e.g. Go. Commented May 18, 2018 at 4:49
  • Then what are your recommendations? Commented May 18, 2018 at 4:50
  • Spend several months learning another programming language, more suited to your needs. Python (or any single other language) is not a panacea. BTW, parallel program design is always difficult Commented May 18, 2018 at 4:52

2 Answers 2

1

You can't utilize 100% CPU with pool = multiprocessing.Pool(multiprocessing.cpu_count()) . It starts your worker function on the number of core given by you but also looks for a free core. If you want to utilize maximum CPU with multiprocessing you should use multiprocessing Process class. It keeps spinning new thread. But be aware it will breakdown system if your CPU doesn't have memory to spin new thread.

Sign up to request clarification or add additional context in comments.

3 Comments

I used multiprocessing Process class but the CPU utilization is still 30%.
Any other methods?
Hi, can you share your code what exactly you are trying to do? You can't go for 100% CPU utilization if your task contains a lot of IO bound, in this case, OS will make CPU idle.
0

"CPU utilization" should be about performance, i.e. you want to do the job in as little time as possible. There is no generic way to do that. If there was a generic way to optimize software, then there would be no slow software, right?

You seem to be looking for a different thing: spend as much CPU time as possible, so that it does not sit idly. That may seem like the same thing, but is absolutely not.

Anyway, if you want to spend 100% of CPU time, this script will do that for you:

import time
import multiprocessing

def loop_until_t(t):
    while time.time() < t:
        pass

def waste_cpu_for_n_seconds(num_seconds, num_processes=multiprocessing.cpu_count()):
    t0 = time.time()
    t = t0 + num_seconds
    print("Begin spending CPU time (in {} processes)...".format(num_processes))
    with multiprocessing.Pool(num_processes) as pool:
        pool.map(loop_until_t, num_processes*[t])
    print("Done.")

if __name__ == '__main__':
    waste_cpu_for_n_seconds(15)

If, instead, you want your program to run faster, you will not do that with an "illustration for parallel processing", as you call it - you need an actual problem to be solved.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.