Multithreading/Parallel Bash Scripts in Unix Environment

Question

I have multiple bash scripts that I have tried to "parallelize" within a master bash script.

Bash Script:

#!/bin/bash
SHELL=/bin/bash

bash /home/.../a.sh &
bash /home/.../b.sh &
wait
bash /home/.../c.sh &
bash /home/.../d.sh &
bash /home/.../e.sh &
wait
echo "Done paralleling!"
exit 0

I have run the script normally (without ampersands) and with ampersands and I am not seeing any appreciable difference in processing time, leading me to believe that something may not be coded correctly/the most efficient way.

Quite possibly, parallel instances can take longer because they're all necessarily using the same physical resource: disk drive(s). A single process instance might be able to make I/O requests much more efficiently since Linux (et al) is able to use very efficient buffering behind the scenes, while it cannot do so nearly so well if it is "buried" by many processes doing more-or-less the same thing. Simply benchmark it, and, if the ampersands don't actually make things appreciably faster on your machine, abandon the idea. "Oh well, it seemed like a nice idea, but ..." — Mike Robinson
– Mike Robinson, Commented Jun 28, 2016 at 14:31
If you are looking into working more with parallel jobs, have a look at GNU Parallel. It will do a lot of the parallelizing work for you and is production quality. — Ole Tange
– Ole Tange, Commented Jul 4, 2016 at 6:25

Mike Robinson · Accepted Answer · 2016-06-28 14:40:31Z

In classic computer-science theory, resource-contention is referred to as "thrashing."

(In the good ol' days, when a 5-megabyte disk drive might be the size of a small washing machine, we used to call it "Maytag Mode," since the poor thing looked like a Maytag washing-machine on the "spin" cycle!)

If you graph the performance curve caused by contention, it slopes upward, then abruptly has an "elbow" shape: it goes straight up, exponentially. We call that, "hitting the wall."

An interesting thing to fiddle-around-with on this script (if you're just curious ...) is to put wait statements at several places. (Be sure you're doing this correctly ...) Allow, say, two instances to run, wait for all of them to complete, then three more, and so on. See if that's usefully faster, and, if it is, try three. And so on. You may find a "sweet spot."

Or ... not. (Don't spend too much time with this. It doesn't look like it's going to be worth it.)

Sobrique · Accepted Answer · 2016-06-28 14:32:24Z

You're likely correct. The thing with parallelism is that it allows you to grab multiple resources to use in parallel. That improves your speed if - and only if - that resource is your limiting factor.

So - for example - if you're reading from a disk - odds are good that the action of reading from disk is what's limiting you, and doing more in parallel doesn't help - and indeed, because of contention can slow the process down. (The disk has to seek to service multiple processes, rather than just 'getting on' and serialising a read).

So it really does boil down to what your script actually does and why it's slow. And the best way of checking that is by profiling it.

At a basic level, something like truss or strace might help.

e.g.

strace -fTtc /home/../e.sh

And see what types of system calls are being made, and how much of the total time they're consuming.

Collectives™ on Stack Overflow

Multithreading/Parallel Bash Scripts in Unix Environment

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related