All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

Commit 56f6d601 authored by Leigh B Stoller's avatar Leigh B Stoller

A lot of work on the RPC code, among other things.

I spent a fair amount of improving error handling along the RPC path,
as well making the code more consistent across the various files. Also
be more consistent in how the web interface invokes the backend and gets
errors back, specifically for errors that are generated when taking to a
remote cluster.

Add checks before every RPC to make sure the cluster is not disabled in
the database. Also check that we can actually reach the cluster, and
that the cluster is not offline (NoLogins()) before we try to do
anything. I might have to relax this a bit, but in general it takes a
couple of seconds to check, which is a small fraction of what most RPCs
take. Return precise errors for clusters that are not available, to the
web interface and show them to user.

Use webtasks more consistently between the web interface and backend
scripts. Watch specifically for scripts that exit abnormally (exit
before setting the exitcode in the webtask) which always means an
internal failure, do not show those to users.

Show just those RPC errors that would make sense users, stop spewing
script output to the user, send it just to tbops via the email that is
already generated when a backend script fails fatally.

But do not spew email for clusters that are not reachable or are
offline. Ditto for several other cases that were generating mail to
tbops instead of just showing the user a meaningful error message.

Stop using ParRun for single site experiments; 99% of experiments.

For create_instance, a new "async" mode that tells CreateSliver() to
return before the first mapper run, which is typically very quickly.
Then watch for errors or for the manifest with Resolve or for the slice
to disappear. I expect this to be bounded and so we do not need to worry
so much about timing this wait out (which is a problem on very big
topologies). When we see the manifest, the RedeemTicket() part of the
CreateSliver is done and now we are into the StartSliver() phase.

For the StartSliver phase, watch for errors and show them to users,
previously we mostly lost those errors and just sent the experiment into
the failed state. I am still working on this.
parent 6b6fdafa
#!/usr/bin/perl -wT
#
# Copyright (c) 2007-2017 University of Utah and the Flux Group.
# Copyright (c) 2007-2018 University of Utah and the Flux Group.
#
# {{{EMULAB-LICENSE
#
......@@ -574,6 +574,18 @@ sub GetCertificate($)
return $cert;
}
# Helper functions for below.
sub ContextError()
{
return GeniResponse->Create(GENIRESPONSE_ERROR(), undef,
"Could not generate context for RPC");
}
sub CredentialError()
{
return GeniResponse->Create(GENIRESPONSE_ERROR(), undef,
"Could not generate credentials for RPC");
}
#
# Create a dataset on the remote aggregate.
#
......@@ -584,13 +596,14 @@ sub CreateDataset($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential, $speaksfor_credential) =
APT_Geni::GenCredentials($cert, $geniuser, ["blockstores"]);
return undef
return CredentialError
if (! (defined($speaksfor_credential) &&
defined($credential)));
......@@ -624,13 +637,13 @@ sub DeleteDataset($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential, $speaksfor_credential) =
APT_Geni::GenCredentials($cert, $geniuser, ["blockstores"], 1);
return undef
return CredentialError()
if (!defined($credential));
my $credentials = [$credential->asString()];
......@@ -657,13 +670,13 @@ sub ModifyDataset($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential, $speaksfor_credential) =
APT_Geni::GenCredentials($cert, $geniuser, ["blockstores"], 1);
return undef
return CredentialError()
if (!defined($credential));
my $credentials = [$credential->asString()];
......@@ -692,13 +705,13 @@ sub ExtendDataset($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential, $speaksfor_credential) =
APT_Geni::GenCredentials($cert, $geniuser, ["blockstores"], 1);
return undef
return CredentialError()
if (!defined($credential));
my $credentials = [$credential->asString()];
......@@ -726,13 +739,13 @@ sub DescribeDataset($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential, $speaksfor_credential) =
APT_Geni::GenCredentials($cert, $geniuser, ["blockstores"], 1);
return undef
return CredentialError()
if (!defined($credential));
my $credentials = [$credential->asString()];
......@@ -759,13 +772,13 @@ sub GetCredential($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential) =
APT_Geni::GenAuthCredential($cert, ["blockstores"]);
return undef
return CredentialError()
if (!defined($credential));
my $args = {
......@@ -789,13 +802,13 @@ sub ApproveDataset($)
my $geniuser = $self->GetGeniUser();
my $context = APT_Geni::GeniContext();
my $cert = $self->GetCertificate();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($context) && defined($cert)));
my ($credential) =
APT_Geni::GenAuthCredential($cert, ["admin"]);
return undef
return CredentialError()
if (!defined($credential));
my $args = {
......
......@@ -28,7 +28,7 @@ use Carp;
use English;
use Data::Dumper;
use Date::Parse;
use POSIX qw(tmpnam);
use File::Temp qw(tempfile tmpnam);
use JSON;
use Exporter;
use vars qw(@ISA @EXPORT $AUTOLOAD
......@@ -420,7 +420,8 @@ sub Delete($)
$agg->Delete() == 0
or return -1;
}
$self->webtask()->Delete();
# We do not "own" the webtask until create_sliver exits successfully.
$self->webtask()->Delete() if (defined($self->{'WEBTASK'}));
DBQueryWarn("delete from apt_instances where uuid='$uuid'") or
return -1;
......@@ -448,6 +449,15 @@ sub Unlock($)
}
return $slice->UnLock();
}
sub Locked($)
{
my ($self) = @_;
my $slice = $self->GetGeniSlice();
if (!defined($slice)) {
return 0;
}
return $slice->locked();
}
sub SetStatus($$)
{
......@@ -1511,7 +1521,7 @@ use EmulabConstants;
use WebTask;
use libtestbed;
use Carp;
use POSIX qw(tmpnam);
use File::Temp qw(tempfile tmpnam);
use JSON;
use English;
use GeniResponse;
......@@ -1549,6 +1559,7 @@ sub Lookup($$$)
$self->{'INSTANCE'} = $instance;
$self->{'WEBTASK'} = undef;
$self->{'STATUS'} = undef;
$self->{'AGGREGATE'}= undef;
bless($self, $class);
# Handy;
......@@ -1716,6 +1727,7 @@ sub DESTROY {
$self->{'DBROW'} = undef;
$self->{'WEBTASK'} = undef;
$self->{'HASH'} = undef;
$self->{'AGGREGATE'}= undef;
}
#
......@@ -1917,6 +1929,23 @@ sub GetGeniAuthority($)
return APT_Geni::GetAuthority($self->aggregate_urn());
}
sub GetAptAggregate($)
{
my ($self) = @_;
return $self->{"AGGREGATE"}
if (defined($self->{"AGGREGATE"}));
$self->{"AGGREGATE"} = APT_Aggregate->Lookup($self->aggregate_urn());
return $self->{"AGGREGATE"};
}
sub AptAggregateName($)
{
my ($self) = @_;
return $self->GetAptAggregate()->name();
}
#
# Update the sliverstatus in the webtask.
#
......@@ -2224,6 +2253,18 @@ sub UpdateSliverStatusAll($$)
return 0;
}
# Helper functions for below.
sub ContextError()
{
return GeniResponse->Create(GENIRESPONSE_ERROR(), undef,
"Could not generate context for RPC");
}
sub CredentialError()
{
return GeniResponse->Create(GENIRESPONSE_ERROR(), undef,
"Could not generate credentials for RPC");
}
#
# Ask aggregate to terminate a sliver.
#
......@@ -2237,7 +2278,7 @@ sub Terminate($)
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
......@@ -2251,7 +2292,7 @@ sub Terminate($)
my $slice_credential = APT_Geni::GenAuthCredential($slice);
if (!defined($slice_credential)) {
print STDERR "Could not generate slice credential\n";
return undef;
return CredentialError()
}
$method = "DeleteSliver";
@params = ($slice->urn(), [$slice_credential->asString()], {});
......@@ -2260,7 +2301,7 @@ sub Terminate($)
my $credentials;
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
$credentials = [$slice_credential->asString()];
......@@ -2285,34 +2326,7 @@ sub Terminate($)
#
Genixmlrpc->SetTimeout(900);
#
# We have to watch for resource busy errors, and retry. For a while
# at least. Eventually give up cause it might be a permanently locked
# slice cause of earlier error.
#
my $response;
my $tries = 10;
while ($tries) {
$response =
Genixmlrpc::CallMethod($cmurl, $context, $method, @params);
# SEARCHFAILED is success.
return $response
if ($response->code() == GENIRESPONSE_SUCCESS ||
$response->code() == GENIRESPONSE_SEARCHFAILED);
return $response
if ($response->code() != GENIRESPONSE_BUSY);
#
# Wait for a while and try again.
#
$tries--;
if ($tries) {
print STDERR "Slice is busy, will retry again in a bit ...\n";
sleep(30);
}
}
my $response = Genixmlrpc::CallMethod($cmurl, $context, $method, @params);
return $response;
}
......@@ -2330,7 +2344,7 @@ sub Extend($$$$)
my $geniuser = $self->instance()->GetGeniUser();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
......@@ -2338,7 +2352,7 @@ sub Extend($$$$)
my $slice_credential = APT_Geni::GenAuthCredential($slice);
if (!defined($slice_credential)) {
print STDERR "Could not generate slice credential\n";
return undef;
return CredentialError()
}
$method = "RenewSliver";
@params = ($slice->urn(), [$slice_credential->asString()],
......@@ -2351,7 +2365,7 @@ sub Extend($$$$)
}
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, \@privs, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
$credentials = [$slice_credential->asString()];
......@@ -2381,14 +2395,14 @@ sub Extend($$$$)
$geniuser->emulab_user()->GetStoredCredential();
if (! defined($certificate_string)) {
print STDERR "Could not get stored certificate for $geniuser\n";
return undef;
return CredentialError();
}
my $certificate =
GeniCertificate->LoadFromString($certificate_string);
if (!defined($certificate)) {
print STDERR
"Could not load stored certificate for $geniuser\n";
return undef;
return CredentialError();
}
# This file will be auto deleted.
$certfile = $certificate->WriteToFile();
......@@ -2402,11 +2416,11 @@ sub Extend($$$$)
" -a -o $credname -s $slice_urn -t $days $userarg");
if ($?) {
print STDERR "Could not create extended credential\n";
return undef;
return CredentialError();
}
if (!open(EXT, $credname)) {
print STDERR "Could not open ext credfile $credname\n";
return undef;
return CredentialError();
}
while (<EXT>) {
$extcred .= $_;
......@@ -2432,10 +2446,6 @@ sub Extend($$$$)
my $response;
while ($tries) {
$response = Genixmlrpc::CallMethod($cmurl, $context, $method, @params);
return undef
if (!defined($response));
if ($response->code() != GENIRESPONSE_SUCCESS) {
if (($response->code() == GENIRESPONSE_SERVER_UNAVAILABLE ||
$response->code() == GENIRESPONSE_BUSY) &&
......@@ -2467,7 +2477,7 @@ sub SliceStatus($)
# Shorten default timeout
Genixmlrpc->SetTimeout(30);
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
......@@ -2477,7 +2487,7 @@ sub SliceStatus($)
my $slice_credential = APT_Geni::GenAuthCredential($slice);
if (!defined($slice_credential)) {
print STDERR "Could not generate slice credential\n";
return undef;
return CredentialError();
}
if ($self->isAL2S()) {
@params = ($slice->urn(), [$slice_credential->asString()], {});
......@@ -2523,7 +2533,7 @@ sub GetManifest($)
my $urn = $self->aggregate_urn();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
......@@ -2531,7 +2541,7 @@ sub GetManifest($)
my $slice_credential = APT_Geni::GenAuthCredential($slice);
if (!defined($slice_credential)) {
print STDERR "Could not generate slice credential\n";
return undef;
return CredentialError();
}
$method = "ListResources";
......@@ -2548,7 +2558,7 @@ sub GetManifest($)
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
$credentials = [$slice_credential->asString()];
......@@ -2579,8 +2589,7 @@ sub GetManifest($)
$tries--;
next;
}
print STDERR "Resolve failed on $urn: ".
(defined($response) ? $response->output() : "") . "\n";
print STDERR "Resolve failed on $urn: ". $response->error() . "\n";
return undef;
}
last;
......@@ -2607,7 +2616,7 @@ sub SliceResolve($)
my $urn = $self->aggregate_urn();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
......@@ -2615,7 +2624,7 @@ sub SliceResolve($)
my $slice_credential = APT_Geni::GenAuthCredential($slice);
if (!defined($slice_credential)) {
print STDERR "Could not generate slice credential\n";
return undef;
return CredentialError();
}
$method = "ListResources";
......@@ -2632,7 +2641,7 @@ sub SliceResolve($)
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
$credentials = [$slice_credential->asString()];
......@@ -2711,8 +2720,7 @@ sub Provision($$$$)
$tries--;
next;
}
$$perrmsg = $response->output()
if (defined($response));
$$perrmsg = $response->error();
return -1;
}
last;
......@@ -2730,13 +2738,13 @@ sub ConsoleInfo($$)
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
my $geniuser = $self->instance()->GetGeniUser();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (! defined($slice_credential));
my $credentials = [$slice_credential->asString()];
......@@ -2764,13 +2772,13 @@ sub ConsoleURL($$)
my $geniuser = $self->instance()->GetGeniUser();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
my $credentials = [$slice_credential->asString()];
......@@ -2799,7 +2807,7 @@ sub CreateImage($$$$;$$$$$)
my $geniuser = $self->instance()->GetGeniUser();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
......@@ -2809,7 +2817,7 @@ sub CreateImage($$$$;$$$$$)
#
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 0);
return undef
return CredentialError()
if (! (defined($speaksfor_credential) &&
defined($slice_credential)));
......@@ -2843,9 +2851,9 @@ sub CreateImage($$$$;$$$$$)
#
# Reboot some nodes
#
sub SliverAction($$$;@)
sub SliverAction($$;@)
{
my ($self, $perrmsg, $which, @slivers) = @_;
my ($self, $which, @slivers) = @_;
my $method = ($which eq "reboot" ? "RestartSliver" :
($which eq "start" ? "StartSliver" : "ReloadSliver"));
my $authority = $self->GetGeniAuthority();
......@@ -2853,13 +2861,13 @@ sub SliverAction($$$;@)
my $urn = $self->aggregate_urn();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
my $credentials = [$slice_credential->asString()];
......@@ -2882,29 +2890,7 @@ sub SliverAction($$$;@)
my $cmurl = $authority->url();
$cmurl = devurl($cmurl) if ($usemydevtree);
my $response;
my $tries = 5;
while ($tries) {
$response = Genixmlrpc::CallMethod($cmurl, $context, $method, $args);
if (!defined($response) || $response->code() != GENIRESPONSE_SUCCESS) {
if (defined($response) &&
($response->code() == GENIRESPONSE_SERVER_UNAVAILABLE ||
$response->code() == GENIRESPONSE_BUSY) &&
$tries >= 0) {
print STDERR "Server for $urn reports too busy or slice busy, ".
"waiting a while ...\n";
sleep(int(rand(20)) + 10);
$tries--;
next;
}
$$perrmsg = $response->output()
if (defined($response));
return $response;
}
last;
}
return $response;
return Genixmlrpc::CallMethod($cmurl, $context, $method, $args);
}
#
......@@ -2916,13 +2902,12 @@ sub Lockdown($$$)
my $authority = $self->GetGeniAuthority();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
my $oldexpires;
return undef
return ContextError()
if (! (defined($authority) &&
defined($slice) && defined($context)));
my $slice_credential = APT_Geni::GenAuthCredential($slice);
goto bad
return CredentialError()
if (! defined($slice_credential));
my $args = {
......@@ -2936,14 +2921,7 @@ sub Lockdown($$$)
my $cmurl = $authority->url();
$cmurl = devurl($cmurl) if ($usemydevtree);
my $response = Genixmlrpc::CallMethod($cmurl, $context, "Lockdown", $args);
$slice->SetExpiration($oldexpires)
if (defined($oldexpires));
return $response;
bad:
$slice->SetExpiration($oldexpires)
if (defined($oldexpires));
return undef;
return Genixmlrpc::CallMethod($cmurl, $context, "Lockdown", $args);
}
#
......@@ -2955,12 +2933,12 @@ sub Panic($$)
my $authority = $self->GetGeniAuthority();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($authority) &&
defined($slice) && defined($context)));
my $slice_credential = APT_Geni::GenAuthCredential($slice);
return undef
return CredentialError()
if (! defined($slice_credential));
my $args = {
......@@ -2972,8 +2950,7 @@ sub Panic($$)
my $cmurl = $authority->url();
$cmurl = devurl($cmurl) if ($usemydevtree);
my $response = Genixmlrpc::CallMethod($cmurl, $context, "Panic", $args);
return $response;
return Genixmlrpc::CallMethod($cmurl, $context, "Panic", $args);
}
#
......@@ -2987,13 +2964,13 @@ sub RunLinktest($$$)
my $urn = $self->aggregate_urn();
my $slice = $self->instance()->GetGeniSlice();
my $context = APT_Geni::GeniContext();
return undef
return ContextError()
if (! (defined($geniuser) && defined($authority) &&
defined($slice) && defined($context)));
my ($slice_credential, $speaksfor_credential) =
APT_Geni::GenCredentials($slice, $geniuser, undef, 1);
return undef
return CredentialError()
if (!defined($slice_credential));
my $credentials = [$slice_credential->asString()];
......@@ -3018,11 +2995,10 @@ sub RunLinktest($$$)
my $cmurl = $authority->url();
$cmurl = devurl($cmurl) if ($usemydevtree);
my $response = Genixmlrpc::CallMethod($cmurl,
$context, "RunLinktest", $args);
return $response;
bad:
return undef;
# Shorten default timeout
Genixmlrpc->SetTimeout(30);
return Genixmlrpc::CallMethod($cmurl, $context, "RunLinktest", $args);
}