All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

Commit 31f8c5f4 authored by Leigh B Stoller's avatar Leigh B Stoller

Reduce the RPC timeout to 60 seconds in the sliverstatus loop, and

say something more informative them "read timeout" if we lose contact
with the backend cluster.

I still need to figure out what to do when this happens, At the moment we
set the status of the new instance to failed, even though it can't be
terminated until the network partition clears up.
parent 9207001b
......@@ -721,6 +721,8 @@ my $interval = 15;
my $ready = 0;
my $failed = 0;
my $public_url;
# Shorten default timeout now.
Genixmlrpc->SetTimeout(60);
while ($seconds > 0) {
sleep($interval);
......@@ -741,7 +743,14 @@ while ($seconds > 0) {
if (defined($response)) {
print STDERR ": " . $response->output();
if (defined($webtask)) {
$webtask->output($response->output());
if ($response->output() =~ /read timeout/) {
$webtask->output("Lost contact with the aggregate. " .
"Possibly a network failure, ".
"please try again later.");
}
else {
$webtask->output($response->output());
}
}
}
print STDERR "\n";
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment