How do I run many SSH remote commands, on multiple machines, in batch?

Question

I use SSH to run some commands on multiple remote machines in a for loop. It executes the same command(s) for a list of IP addresses. Some of the IP addresses might be unreachable, so I used the ConnectTimeout option.

However, my script didn't work the way I wanted. Actually it got stuck at the first unreachable IP instead of giving up and trying the next IP address on my list.

Here is the relevant part of my script:

for ip in ${IP} ; do
    ssh  -o BatchMode=yes \
         -o StrictHostKeyChecking=no \
         -o ConnectTimeout=10 \
         -l ${USERNAME} \
         ${SCRIPT_HOST} \
         "${COMMAND} -i $ip || echo timeout" \
         >> ./myscript.out
done

It is working fine for reachable IPs, but if a specific IP is down, it waits for a while (much more than 10s, maybe 35-40 seconds) and displays an error message to my terminal:

ERROR connecting : Connection timed out

So I'm wondering which option I didn't use correctly.

can't it run in background ?? and ignore error by doing <your command> 2>/dev/null — Dinesh Reddy
– Dinesh Reddy, Commented Mar 20, 2014 at 14:47
Have you tried executing ssh in debugging mode (i.e. verbose mode)? — GoofyBall
– GoofyBall, Commented Apr 17, 2014 at 0:37

Sigi · Accepted Answer · 2014-04-28 05:13:06Z

15

Your use of ConnectTimeout is correct, so it is not obvious why it only times out after 30 or more seconds.

Here's how I would change your script to avoid the timeout problem entirely:

Use GNU parallel to connect to more than one destination host at the same time.
Use the -f option to SSH to process it in the background.

Here is a solution with GNU parallel, running at most 50 connections at the same time:

parallel --gnu --bg --jobs 50 \
ssh -o BatchMode=yes \
    -o StrictHostKeyChecking=no \
    -o ConnectTimeout=10 \
    -l ${USERNAME} \
    {} \
    "${COMMAND} -i {} || echo timeout" \
::: ${IP}

parallel <command> ::: <arguments> will execute <command> <argument> many times in parallel by splitting the <arguments> list. The placeholder for <argument> is {}.

Use parallel --jobs n to limit the number of parallel connections.

edited Apr 28, 2014 at 5:13

answered Apr 27, 2014 at 17:59

Sigi

1,88414 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Cleb · Accepted Answer · 2015-11-25 20:05:08Z

1

The connection timeout is for when you have already established a connection and if the connection stays idle for that amount of time in seconds, then it will disconnect (That is if you did not also activate the KEEP_ALIVE ssh parameter that prevent a connection from ever being idle).

The reason it takes 30+ seconds before you get a time out is because it is the TCP protocol internal timer that try to connect for that amount of time and return that error message that he cannot connect to the sftp server. It does not comes from ssh.

edited Nov 25, 2015 at 20:05

Cleb

26.3k23 gold badges129 silver badges164 bronze badges

answered Nov 25, 2015 at 18:49

tsezane

191 bronze badge

1 Comment

Sigi Over a year ago

This answer contradicts the SSH documentation (also, even if the OS does not allow you to shorten the timeout on the socket proper, you could still run your own timer and drop the attempt after any time).Here's the relevant part from the ssh_config(5) manual page about the ConnectTimeout option: "Specifies the timeout (in seconds) used when connecting to the SSH server, instead of using the default system TCP timeout. This value is used only when the target is down or really unreachable, not when it refuses the connection."

Collectives™ on Stack Overflow

How do I run many SSH remote commands, on multiple machines, in batch?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related