2

My Requirement is to run a shell function or script in parallel with multi-processing. Currently I get it done with the below script that doesn't use multi-processing. Also when I start 10 jobs in parallel, one job might get completed early and has to wait for the other 9 jobs to complete. I wanted eliminate this with the help of multiprocessing in python.

i=1 
total=`cat details.txt  |wc -l`
while [ $i -le $total ]
do
name=`cat details.txt | head -$i | tail -1 | awk '{print $1}'
age=`cat details.txt | head -$i | tail -1 | awk '{print $2}'
./new.sh $name $age  &
   if (( $i % 10 == 0 )); then wait; fi
done
wait

I want to run ./new.sh $name $age inside a python script with multiprocessing enabled(taking into account the number of cpu) As you can see the value of $name and $age would change in each execution. Kindly share your thoughts

2
  • 1
    Your code is already using multiprocessing: Each invocation of ./new.sh is run in a separate process and will be scheduled by the OS. Commented Apr 18, 2015 at 10:02
  • @MichaelJaros. Ok, But My requirement is also to start the 11 the job when one of the first 10 jobs end. Commented Apr 20, 2015 at 5:22

1 Answer 1

4

First, your whole schell script could be replaced with:

awk '{ print $1; print $2; }' details.txt | xargs -d'\n' -n 2 -P 10 ./new.sh

A simple python solution would be:

from subprocess import check_call
from multiprocessing.dummy import Pool

def call_script(args):
    name, age = args  # unpack arguments
    check_call(["./new.sh", name, age])

def main():
    with open('details.txt') as inputfile:
        args = [line.split()[:2] for line in inputfile]
    pool = Pool(10)
    # pool = Pool()  would use the number of available processors instead
    pool.map(call_script, args)
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

Note that this uses multiprocessing.dummy.Pool (a thread pool) to call the external script, which in this case is preferable to a process pool, since all the call_script method does is invoke the script and wait for its return. Doing that in a worker process instead of a worker thread wouldn't increase performance since this is an IO based operation. It would only increase the overhead for process creation and interprocess communication.

Sign up to request clarification or add additional context in comments.

3 Comments

.the shell script you suggested worked and I wanted to try out the python code too. I am getting this error " AttributeError: 'ThreadPool' object has no attribute 'starmap'" My python version is 2.6
You're right, starmap was introduced in python3.3 - changed the example to use map instead
Thanks a lot @mata. As you said the shell script you shared is better in performance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.