Commit 1c4a613c authored by Leigh B. Stoller's avatar Leigh B. Stoller

Changes to exit status stuff to reflect recent changes by Rob to how

assign exits (exit codes).

* in assign_wrapper, no longer return any status from assign to the
  caller. This was pointless. Instead, return 0 on success, 1 on
  controlled error, and -1 on uncontrolled error (die() called
  someplace). Add in CANRECOVER bit whenever the wrapper exits, even
  if uncontrolled, by putting in an END block to catch the die. This
  should prevent certain cases where a swapmod error would be flagged
  as not recoverable.

* Remove most of the assign output processing since we no longer
  return its codes. Still print a portion of it to the log though.

* Change call to fatal() in assign_wrapper; do not pass an exitcode
  since in every case it was the same damn thing!

* Change tbswap to no longer carry assign_wrapper exit code to its
  exit.

* Change the batch daemon to treat all errors as continuable (keep
  batch queued) unless exit code is -1. We will need to revisit this a
  bit perhaps, when Rob adds precheck code.
parent e1e6d268
This diff is collapsed.
......@@ -473,12 +473,14 @@ sub startexp($)
"where eid='$eid' and pid='$pid'");
#
# The exit value is important. If its -1 or 1, thats bad. Anything
# else implies an assign violation that is (hopefully) transient.
# The exit status does not tell us if the experiment can ever be
# mapped. In fact, that is really hard to know at this level;
# it depends on what resources the testbed actually has. So,
# unless status is -1 (really really fatal) just keep going.
# We leave it up the user to kill the batch if it looks like its
# never going to work.
#
if ($exit_status == 1 || $exit_status == -1) {
if ($exit_status == 255) {
TBBatchUnLockExp($pid, $eid, EXPTSTATE_SWAPPED());
email_status("Experiment startup has failed with a fatal error!\n".
......
......@@ -70,7 +70,6 @@ my $updateReconfig = 1;
my $updateEventsys_restart = 0;
my $force = 0;
my $errors = 0;
my $assignWrapperErrorCode = 0;
my $updatehosed = 0;
my $state;
my $canceled;
......@@ -269,15 +268,13 @@ elsif ($swapop eq "in") {
# Write appropriate message and exit.
#
if ($errors) {
print "Failingly finished swap-$swapop for $pid/$eid. " .TBTimeStamp(). "\n";
print "Failingly finished swap-$swapop for $pid/$eid. ".TBTimeStamp()."\n";
TBDebugTimeStamp("tbswap $swapop finished (failed)");
# pass assign wrapper info along.
# other codes in 'errors' (3 or 7) are meaningless and
# should just be reported as 1's.
exit(1 | $assignWrapperErrorCode | ($updatehosed ? 0x40 : 0));
# Pass out magic value to indicate that update failed!
exit(1 | ($updatehosed ? 0x40 : 0));
}
print "Successfully finished swap-$swapop for $pid/$eid. " . TBTimeStamp() ."\n";
print "Successfully finished swap-$swapop for $pid/$eid. " .TBTimeStamp()."\n";
TBDebugTimeStamp("tbswap $swapop finished (succeeded)");
exit(0);
......@@ -552,24 +549,17 @@ sub doSwapin($) {
if ($type == RETRY);
if (system("$wrapper $pid $eid")) {
#
# save this off so it will get passed back later.
# Note that -1 is an uncontrolled error. No recovery.
#
$assignWrapperErrorCode = $? >> 8;
$exitcode = $? >> 8;
print STDERR "*** Failed ($exitcode) to map to reality.\n";
if (($exitcode & 64) && ($exitcode != 255)) {
# so batchexp doesn't choke.
$assignWrapperErrorCode -= 64
if ($exitcode != 255);
# Wrapper sets this bit when recovery is possible.
if ($exitcode & 64) {
# We can recover.
return 7;
}
else {
# No recovery.
# No recovery, no retry.
return 1;
}
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment