Commit 272cb767 authored by Mike Hibler's avatar Mike Hibler

Stop relying on return code of $ssh as an indicator of success/failure/timeout.

What ssh returns in the case of a timeout depends on timing, and since we
started using sshtb, even the check for a timeout return code wasn't working.
parent 564d958d
...@@ -658,7 +658,8 @@ sub RebootNode { ...@@ -658,7 +658,8 @@ sub RebootNode {
$syspid = fork(); $syspid = fork();
if ($syspid) { if ($syspid) {
local $SIG{ALRM} = sub { kill("TERM", $syspid); }; my $timedout = 0;
local $SIG{ALRM} = sub { kill("TERM", $syspid); $timedout = 1; };
alarm 20; alarm 20;
waitpid($syspid, 0); waitpid($syspid, 0);
alarm 0; alarm 0;
...@@ -670,16 +671,17 @@ sub RebootNode { ...@@ -670,16 +671,17 @@ sub RebootNode {
print STDERR "reboot ($pc): reboot returned $?.\n" if $debug; print STDERR "reboot ($pc): reboot returned $?.\n" if $debug;
# #
# If either ssh is not running or it timed out, # We used to special case $?==256 here as meaning "ssh is not running"
# send it a ping of death. # but relying on any return code here is dubious. Too much depends on
# the timing of the reboot operation on the client. So we just check
# for a self-induced timeout here and immediately send a PoD in that
# case. Otherwise, we assume the reboot happened and we will catch
# our error below if the node does not stop pinging within a couple
# of seconds.
# #
if ($? == 256 || $? == 15) { if ($timedout) {
if ($? == 256) { print STDERR "*** reboot ($pc): wedged.\n" if $debug;
print STDERR "*** reboot ($pc): not running sshd.\n" if $debug; info("$pc: ssh reboot failed (hung) ... sending ipod");
} else {
print STDERR "*** reboot ($pc): wedged.\n" if $debug;
}
info("$pc: ssh reboot failed ... sending ipod");
print STDERR "*** reboot ($pc): Trying Ping-of-Death.\n" if $debug; print STDERR "*** reboot ($pc): Trying Ping-of-Death.\n" if $debug;
system("$ipod $pc"); system("$ipod $pc");
...@@ -702,9 +704,9 @@ sub RebootNode { ...@@ -702,9 +704,9 @@ sub RebootNode {
$UID = $oldUID; $UID = $oldUID;
# #
# Okay, before we power cycle lets really make sure. We wait a while # Okay, before we try IPoD or power cycle lets really make sure we need to.
# for it to stop responding to pings, and if it never goes silent, # We wait a while for the node to stop responding to pings, and if it never
# punch the power button. # goes silent, whack it with a bigger stick.
# #
if (WaitTillDead($pc) == 0) { if (WaitTillDead($pc) == 0) {
my $state = TBDB_NODESTATE_SHUTDOWN; my $state = TBDB_NODESTATE_SHUTDOWN;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment