Commit ed0d25b4 authored by Mike Hibler's avatar Mike Hibler

Phase II in disk state saving for swapout.

Exec summary: after this checkin, the infrastructure exists (once enabled)
to create swapout-time "delta" images for all machines in experiments.
There is only a single, cumulative swap image per node (i.e., all diffs
are from the base image, not from the previous swap).

What doesn't yet exist, is the mechanism for reloading the delta at
swapin time.  That is Phase III.

The nitty-gritty:

1. Keep disk image signature files for all nodes in an experiment.

   New fields in the DB to track, for each disk partition, what image the
   partition was loaded from.  This enables us at swapin or os_load time to
   create signature files in /proj/<pid>/exp/<eid>/swapinfo for the current
   contents of a node disk/partition.  All nodes with the same image loaded
   will share (via symlink) the same signature file.  TODO: no longer
   referenced signature files should be removed.

   Signature info is only collected in the swapinfo directory if the
   experiment is set to have disk state saving enabled (see #5 below).
   Info consists of the <vname>.sig file, which is the file created
   by imagehash, and <vname>.part which says what the root disk is
   for the node and whether to look at the whole disk or just a single
   partition when crafting the delta image.

2. Swapout-time hook for creating swapout image.

   If the experiment is marked as allowing disk state saving, tbswap
   will arrange to run and then monitor the create-swapimage command
   on each node.  This script will run the modified version of imagezip
   which uses the signature file to create a delta image.

   The command to run and maximum timeout are specified via sitevars
   (previously checked in).  Note that the tbswap script currently has
   special knowledge of /usr/local/bin/create-swapimage as a swapout
   time script.  If the swap/swapout_command sitevar is set to that,
   Magic Stuff shall occur (i.e. it will monitor the command and make
   periodic reports of progress).  The sitevars are a total hack and
   will disappear at some point.

3. Client-side script for creating swapout image.

   os/create-swapimage, very similar to create-image.  Uses the info
   stashed in /proj/..blahblah../swapinfo to create a delta image.

   XXX fer now hack: the script first looks in /proj/<pid>/bin for an
   imagezip binary to use.  Failing that, it uses the one in the MFS.
   This allows for easier development of the imagezip changes (i.e.,
   don't have to update the MFS every time.

4. Auto creation of signature files for new images.

   The create_image script (the one that runs on boss when creating images
   for users) has been modified to automatically create a signature via
   imagehash.  The .sig file winds up in /usr/testbed/images/sigs or
   in /proj/<pid>/images/sigs.  From there it will be copied at swapin/os_load
   time to the per-expt swapinfo directory for any node that uses the images.

   The process for creating standard system images (aka, "Mike") has not
   yet been modified.  When the image creation/installation procedure
   is formalized into a script, this will be done.

5. Web changes to set/clear saving of disk state at swapout time.

   Add a checkbox to the experiment create page to allow setting "save
   swap state".  Also added to the experiment modify page, but currently
   "if (0)"ed out as it will need some additional support.  The showstuff
   page will show it.

   Taking a page from Leigh's hack book, if EXPOSESTATESAVE in defs.php3
   is set to zero (as it is now), then the checkbox doesn't appear in the
   create experiment page except for STUDLY users.
parent 8c89fa8f
......@@ -156,7 +156,8 @@ use vars qw(@ISA @EXPORT);
SetNodeBootStatus OSFeatureSupported IsShelved NodeidToExp NodeidToExpOldReserved
UserDBInfo DBQuery DBQueryFatal DBQueryWarn DBWarn DBFatal DBErr
DBQuoteSpecial UNIX2DBUID ExpState SetExpState ProjLeader
ExpNodes ExpNodesOldReserved DBDateTime DefaultImageID GroupLeader TBGroupUnixInfo
ExpNodes ExpNodeVnames ExpNodesOldReserved
DBDateTime DefaultImageID GroupLeader TBGroupUnixInfo
TBValidNodeLogType TBValidNodeName TBSetNodeLogEntry
TBSetSchedReload MapNodeOSID TBLockExp TBUnLockExp TBSetExpSwapTime
TBUnixGroupList TBOSID TBOSMaxConcurrent TBOSCountInstances
......@@ -174,7 +175,7 @@ use vars qw(@ISA @EXPORT);
TBNodeAllocCheck TBPlabNodeUsername MarkPhysNodeDown TBExptIsElabInElab
TBExptFirewall TBNodeFirewall TBExptFirewallAndPort
TBSetExptFirewallVlan TBClearExptFirewallVlan
TBNodeConsoleTail TBExptGetSwapoutAction
TBNodeConsoleTail TBExptGetSwapoutAction TBExptGetSwapState
TBNodeSubNodes
TBNodeAdminOSID TBNodeDiskloadOSID
......@@ -1665,9 +1666,10 @@ sub ExpNodesOldReserved($$)
#
# Return a list of all the nodes in an experiment.
#
# usage: ExpNodes(char *pid, char *eid)
# usage: ExpNodes(char *pid, char *eid, [bool islocal])
# returns the list if a valid pid/eid.
# returns 0 if an invalid pid/eid or if an error.
# If the optional flag is set, returns only local nodes.
# Returns 0 if an invalid pid/eid or if an error.
#
sub ExpNodes($$;$)
{
......@@ -1708,6 +1710,57 @@ sub ExpNodes($$;$)
return @nodes;
}
#
# Return a hash of all the nodes in an experiment. The hash maps pnames
# to vnames.
#
# usage: ExpNodeVnames(char *pid, char *eid, [bool islocal])
# returns the hash if a valid pid/eid.
# If the optional flag is set, returns only local nodes.
# Returns 0 if an invalid pid/eid or if an error.
#
sub ExpNodeVnames($$;$)
{
my($pid, $eid, $flag) = @_;
my(@row);
my(%nodes);
my $clause = "";
if (defined($flag)) {
$clause = "and nt.isremotenode=0";
}
my $query_result =
DBQueryWarn("select r.node_id,r.vname from reserved as r ".
"left join nodes as n on n.node_id=r.node_id ".
"left join node_types as nt on nt.type=n.type ".
"where r.pid='$pid' and r.eid='$eid' $clause");
if (!$query_result || $query_result->numrows == 0) {
return ();
}
while (@row = $query_result->fetchrow_array()) {
my $node = $row[0];
my $vname = $row[1];
#
# Taint check. I do not understand this sillyness, but if I
# taint check these node names, I avoid warnings throughout.
#
if ($node =~ /^([-\w]+)$/) {
$node = $1;
if ($vname =~ /^([-\w]+)$/) {
$vname = $1;
} else {
$vname = $node;
}
$nodes{$node} = $vname;
} else {
print "*** $0: WARNING: Bad node name: $node.\n";
}
}
return %nodes;
}
#
# Mark a node as down. We schedule a next reservation for it so that it
# remains in the users experiment through the termination so that there
......@@ -4186,6 +4239,26 @@ sub TBExptGetPanicBit($$$) {
return 1;
}
#
# Get the value of the swapout state.
# Right now this is just the savedisk field.
# Returns 1 if there is swap state, 0 otherwise.
#
sub TBExptGetSwapState($$$) {
my ($pid, $eid, $statep) = @_;
my $query_result =
DBQueryWarn("select savedisk from experiments ".
"where pid='$pid' and eid='$eid'");
if (!$query_result || $query_result->num_rows == 0) {
return 0;
}
my @row = $query_result->fetchrow_array();
$$statep = $row[0];
return 1;
}
#
# See if there is an admin MFS swapout action associated with the experiment.
# For now we just look at a globally defined action via sitevar.
......@@ -4195,16 +4268,35 @@ sub TBExptGetPanicBit($$$) {
#
sub TBExptGetSwapoutAction($$$) {
my ($pid, $eid, $ref) = @_;
my ($action, $faction);
my ($action, $faction, $timeout);
if (TBGetSiteVar("swap/swapout_command", \$action)) {
my $failisfatal = 1;
#
# Swapout-time state saving.
# Only perform if the experiment has desired state saving.
#
if ($action =~ /create-swapimage/) {
my $doit;
my $query_result =
DBQueryWarn("select savedisk from experiments ".
"where pid='$pid' and eid='$eid'");
if (!$query_result || $query_result->num_rows == 0 ||
(($doit) = $query_result->fetchrow_array()) == 0) {
%$ref = ();
return 0;
}
}
if (TBGetSiteVar("swap/swapout_command_failaction", \$faction)) {
$failisfatal = ($faction eq "fail");
}
TBGetSiteVar("swap/swapout_command_timeout", \$timeout);
%$ref = ('command' => $action, 'isfatal' => $failisfatal);
%$ref = ('command' => $action,
'isfatal' => $failisfatal,
'timeout' => $timeout);
return 1;
}
......@@ -4414,7 +4506,7 @@ sub MapNodeOSID($$)
my ($node, $osid) = @_;
#
# See id this this OSID is actually loaded on the machine.
# See if this OSID is actually loaded on the machine.
#
my $p_result =
DBQueryWarn("select * from partitions ".
......
......@@ -59,7 +59,8 @@ mfs:
$(MAKE) -C zapdisk mfs
mfs-install: mfs
$(INSTALL_PROGRAM) $(SRCDIR)/create-image $(LBINDIR)/create-image
$(INSTALL_PROGRAM) $(SRCDIR)/create-image $(LBINDIR)/
$(INSTALL_PROGRAM) $(SRCDIR)/create-swapimage $(LBINDIR)/
$(MAKE) -C imagezip client-install
$(MAKE) -C zapdisk mfs-install
......
#!/usr/bin/perl -wT
#
# EMULAB-COPYRIGHT
# Copyright (c) 2000-2005 University of Utah and the Flux Group.
# All rights reserved.
#
use English;
use Getopt::Std;
#
# Create a swapout-time disk image. By default, we save an incremental
# image based on the image signature. Use -f to create a full image.
# Caller must have sudo permission!
#
# XXX for now, all the arguments are intuited (instead of using tmcc).
# XXX we should probably save the old swap image in case of failure.
#
sub usage()
{
print STDOUT "Usage: create-swapimage [-f]\n";
exit(-1);
}
my $optlist = "f";
#
# Turn off line buffering on output
#
$| = 1;
# Drag in path stuff so we can find emulab stuff.
BEGIN { require "/etc/emulab/paths.pm"; import emulabpaths; }
#
# Load the OS independent support library. It will load the OS dependent
# library and initialize itself.
#
use libsetup;
my $debug = 1;
my $me = "create-swapimage";
#
# No configure vars.
#
my $sudo = "/usr/local/bin/sudo";
my $zipperdir = "/usr/local/bin";
my $zipperbin = "imagezip";
my $zipper = "$zipperdir/$zipperbin";
my $device;
my $filename;
my $fullimage = 0;
my $args = "";
#
# Parse command arguments. Once we return from getopts, all that should be
# left are the required arguments.
#
%options = ();
if (! getopts($optlist, \%options)) {
usage();
}
if (@ARGV > 0) {
usage();
}
if ($options{"f"}) {
$fullimage = 1;
}
my ($pid, $eid, $vname) = check_nickname();
if (!defined($eid)) {
die("Node is not allocated!?");
}
if (!chdir("/proj/$pid/exp/$eid/swapinfo")) {
die("Swapinfo directory for $pid/$eid does not exist!");
}
if (! -r "$vname.part" || (! $fullimage && ! -r "$vname.sig")) {
die("Swapinfo signature/partition info for $pid/$eid does not exist!");
}
$args = "-H $vname.sig"
if (!$fullimage);
my $info = `cat $vname.part`;
if ($info !~ /DISK=(\w+) LOADPART=([0-4]) BOOTPART=([1-4])/) {
die("Swapinfo partition info for $pid/$eid is malformed!");
}
$device = "/dev/$1";
$lpart = $2;
$bpart = $3;
$filename = "$vname-swap.ndz";
print STDERR "$me: device=$device, loadpart=$lpart, bootpart=$bpart\n"
if ($debug);
#
# XXX For now we just use the load partition to dictate what we save.
#
# In the case where LOADPART=0, meaning a whole-disk image, we are almost
# certainly saving more than we care about. Chances are that when swapping
# in, the user specified one of the standard OSes which is part of the whole
# disk image that is loaded on the disk by default. In this case we will be
# saving the entire disk, even though they probably only care about the
# partition they are running from. Technically, this is the correct thing
# to do, since they could have (re)used the other partitions and we will
# want to pick up those changes. However, most of the time they probably
# haven't done anything to the rest of the disk and we are just waiting time
# scanning the entire disk (though the resulting image will not be any larger).
#
# So, the boot partition is passed in just in case we someday want to
# distinguish this case. What we could (should?) do, is add an OTHERPARTS=
# field to the file to give us a list of partitions that are active. Then
# we would always do a full-disk image but construct a list of -I options to
# ignore the inactive partitions.
#
if ($lpart != 0) {
$args .= " -s $lpart";
}
#
# Save the old swap image if it exists, both as a backup and so that the
# imagefile size starts at zero for the benefit of monitoring processes.
#
my $ofilename = "$filename.OLD";
if (-e $filename) {
unlink($ofilename);
if (!rename($filename, $ofilename)) {
warn("$me: could not back up old image, clobbering it!");
unlink($filename);
$ofilename = "";
}
}
#
# XXX tmp hack: see if there is a newer version of the image zipper.
# This way we do not have to update the admin MFS everytime we want to
# try a new debugger, making it easier in the debugging phase.
#
if (-x "/proj/$pid/bin/$zipperbin") {
$zipper = "/proj/$pid/bin/$zipperbin";
warn("$me: using alternate zipper $zipper\n");
}
#
# Run the command using sudo, since by definition only testbed users
# with proper trust should be able to zip up a disk. sudo will fail
# if the user is not in the proper group.
#
print STDERR "$me: doing '$sudo $zipper $args $device $filename'\n"
if ($debug);
if (system("$sudo $zipper $args $device $filename")) {
print STDERR "*** Failed to create image!\n";
if ($ofilename ne "") {
print STDERR " Restoring old image\n";
rename($ofilename, $filename) or
warn(" Could not restore old image file!\n");
}
exit 1;
}
#
# Get rid of the backup image
#
if ($ofilename ne "") {
unlink($ofilename);
}
exit 0;
......@@ -44,6 +44,7 @@ sub usage()
"-n - Do not send idle email (internal option only)\n".
"-a <nnn> - Auto swapout nnn minutes after experiment is swapped in\n".
"-l <nnn> - Auto swapout nnn minutes after experiment goes idle\n".
"-s - Save disk state on swapout\n".
"-E <str> - A pithy sentence describing your experiment\n".
"-p <pid> - The project in which to create the experiment\n".
"-g <gid> - The group in which to create the experiment\n".
......@@ -55,7 +56,7 @@ sub usage()
sub ParseArgs();
sub fatal($);
my $optlist = "iE:g:e:p:S:L:a:l:fwqt:nz";
my $optlist = "iE:g:e:p:S:L:a:l:sfwqt:nz";
my $batchmode= 1;
my $frontend = 0;
my $waitmode = 0;
......@@ -63,6 +64,7 @@ my $quiet = 0;
my $linktest = 0; # non-zero means level to run at.
my $zeemode = 0; # Hey, out of options.
my $zeeopt = ""; # To pass along.
my $savestate= 0;
#
# Configure variables
......@@ -312,14 +314,14 @@ if (! DBQueryWarn("INSERT INTO experiments ".
" idleswap, idleswap_timeout, autoswap, autoswap_timeout,".
" idle_ignore, keyhash, expt_locked, eventkey,".
" noswap_reason, noidleswap_reason, batchmode, ".
" batchstate, linktest_level) ".
" batchstate, linktest_level, savedisk) ".
"VALUES ($exptidx, '$eid', '$pid', '$gid', now(), ".
"$description,'$dbuid', '$dbuid', '$exptstate', $priority, ".
"$swappable, $idleswap, '$swaptime', $autoswap, ".
"'$autoswaptime', $idleignore, '$webkey', ".
"now(), '$eventkey', $noswap_reason, ".
"$noidleswap_reason, $batchmode, '$batchstate', ".
"$linktest)")) {
"$linktest, $savestate)")) {
DBQueryWarn("unlock tables");
die("*** $0:\n".
" DB error inserting experiment record for $pid/$eid!\n");
......@@ -909,6 +911,10 @@ sub ParseArgs()
$idleignore = 1;
}
if (defined($options{"s"})) {
$savestate = 1;
}
#
# pid,eid,gid get passed along as shell commands args; must taint check.
#
......
......@@ -14,7 +14,7 @@ use Exporter;
use vars qw(@ISA @EXPORT);
@ISA = "Exporter";
@EXPORT = qw ( osload osload_wait );
@EXPORT = qw ( osload osload_wait osload_setupswapinfo );
# Must come after package declaration!
use lib '@prefix@/lib';
......@@ -53,6 +53,7 @@ sub osload ($$) {
my $noreboot = 0;
my $asyncmode = 0;
my $zerofree = 0;
my $swapinfo = 0;
# Locals
my %retries = ();
......@@ -86,6 +87,9 @@ sub osload ($$) {
if (defined($args->{'zerofree'})) {
$zerofree = $args->{'zerofree'};
}
if (defined($args->{'swapinfo'})) {
$swapinfo = $args->{'swapinfo'};
}
#
# Figure out who called us. Root and admin types can do whatever they
......@@ -200,6 +204,7 @@ sub osload ($$) {
my $imagepath = $rowref->{'path'};
my $defosid = $rowref->{'default_osid'};
my $maxwait = $rowref->{'maxloadwait'};
my $imagepid = $rowref->{'pid'};
print "osload ($node): Changing default OS to $defosid\n";
if (!$TESTMODE) {
......@@ -241,8 +246,9 @@ sub osload ($$) {
$dbresult =
DBQueryWarn("replace into partitions ".
"(partition, osid, node_id) ".
"values('$i', '$osid', '$node')");
"(node_id,partition,osid,imageid,imagepid) ".
"values ".
"('$node','$i','$osid','$imageid','$imagepid')");
}
else {
$dbresult =
......@@ -256,6 +262,15 @@ sub osload ($$) {
}
}
#
# Setup swapinfo now after partitions have initialized but before
# we setup the one-shot frisbee load.
#
if ($swapinfo) {
print "osload: Updating image signature.\n";
osload_setupswapinfo(undef, undef, $node);
}
#
# Determine which mode to use for reloading this node (note: this may
# become an entry in node_capabilities or something like that in the
......@@ -807,5 +822,169 @@ sub osload_wait($)
return $? >> 8;
}
#
# Save signature files and boot partition info for all nodes in an experiment
# (or just the listed nodes). We call this when swapping in an experiment or
# when reloading nodes in an experiment.
#
# Note that this is not strictly an os loading function, we do it on swapins
# of nodes which already have the correct OS as well. But we stick it here
# because it is about os loading in principle.
#
sub osload_setupswapinfo($$;@)
{
my ($pid, $eid, @nodelist) = @_;
my %nodeinfo = ();
my $allnodes;
my $clause = "";
# XXX comment this out to force save of all disks
$clause = "e.savedisk!=0 and ";
if (!defined(@nodelist)) {
@nodelist = ExpNodes($pid, $eid, 1);
$clause .= "r.pid='$pid' and r.eid='$eid'";
$allnodes = 1;
} else {
$clause .= "r.node_id in (" . join(",", map("'$_'", @nodelist)) . ")";
$allnodes = 0;
}
map { $nodeinfo{$_} = 0 } @nodelist;
# XXX only know how to do this for local PCs right now
$clause .= " and nt.imageable!=0 and nt.class='pc' and nt.isremotenode=0";
#
# Note that we are using the def_boot_osid from the nodes table to identify
# the image of interest. This is because the osid field is set by stated
# after a node has reached the BOOTING state the first time, and may be
# set to an MFS at other times.
#
my $query_result = DBQueryWarn(
"select r.node_id,r.vname,r.pid,r.eid,n.osid,nt.disktype,nt.bootdisk_unit,p.partition,p.imageid,p.imagepid,i.loadpart ".
"from reserved as r ".
"left join nodes as n on n.node_id=r.node_id ".
"left join node_types as nt on nt.type=n.type ".
"left join partitions as p on p.node_id=n.node_id and p.osid=n.def_boot_osid ".
"left join images as i on i.imageid=p.imageid ".
"left join experiments as e on e.pid=r.pid and e.eid=r.eid ".
"where $clause");
if (!$query_result) {
return 1;
}
while (my ($node, $vname, $rpid, $reid, $osid, $dtype, $dunit, $part, $imageid, $imagepid, $lpart) =
$query_result->fetchrow_array()) {
#
# XXX not a disk-based OSID. This can happen during frisbee loads
#
if (!defined($imageid)) {
print "*** swapinfo: $osid is not disk-based!?\n";
next
if (!$allnodes);
return 1;
}
# Sanity checks
if (!defined($nodeinfo{$node})) {
next
if (!$allnodes);
print "*** swapinfo: Got partition info for invalid node $node!?\n";
return 1;
}
if ($nodeinfo{$node} != 0) {
print "*** swapinfo: Got redundant partition info for $node!?\n";
return 1;
}
my $disk = "$dtype$dunit";
$nodeinfo{$node} =
[$vname, $rpid, $reid, $osid, $disk, $part, $imageid, $imagepid, $lpart];
}
#
# Copy over the signature file for the image used on every node under
# the name <vname>.sig. Likewise, we record the partition that the
# image resides in under <vname>.part.
#
# Note that we actually copy the signature over as <imageid>.sig and
# then symlink the <vname>.sig's to it. This not only saves space,
# but makes it easier to determine what is loaded on each node.
#
for my $node (keys(%nodeinfo)) {
my $infop = $nodeinfo{$node};
if ($infop == 0) {
print "*** swapinfo: WARNING: got no partition info for $node!\n";
next;
}
my ($vname, $rpid, $reid, $osid, $disk, $part, $imageid, $imagepid, $lpart) = @{$infop};
#
# If imageid is not "fully qualified" with the project name,
# generate a name that is.
#
my $rimageid = $imageid;
if ($rimageid !~ /^$imagepid-/) {
$rimageid = "$imagepid-$imageid";
}
# XXX backward compat
my $infodir = "/proj/$rpid/exp/$reid/swapinfo";
if (! -d "$infodir" && !mkdir($infodir, 0770)) {
print "*** swapinfo: no swap info directory $infodir!\n";
next
if (!$allnodes);
return 1;
}
#
# First make sure we get rid of any old signature for the node
# in case any of the following steps fail.
#
unlink("$infodir/$vname.sig", "$infodir/$vname.part");
my ($sigdir, $signame);
if ($imagepid eq TBOPSPID()) {
$sigdir = "$TB/images/sigs";
} else {
$sigdir = "/proj/$imagepid/images/sigs";
}
$signame = "$imageid.ndz.sig";
$signame =~ s/^$imagepid-//;
if (! -d $sigdir || ! -f "$sigdir/$signame") {
print "*** swapinfo: WARNING: no image signature for $rimageid, ".
"cannot save swapout state!\n";
next;
}
my $basesig = "$infodir/$rimageid";
if (! -r $basesig) {
if (system("/bin/cp -p $sigdir/$signame $basesig")) {
print "*** swapinfo: WARNING: ".
"could not create signature $basesig, ".
"cannot save swapout state!\n";
next;
}
}
if (system("/bin/ln -s $rimageid $infodir/$vname.sig")) {
print "*** swapinfo: WARNING: ".
"could not create signature $infodir/$vname.sig, ".
"cannot save swapout state!\n";
next;
}
if (!open(FD, "> $infodir/$vname.part")) {
print "*** swapinfo: WARNING: ".
"could not create partition file $infodir/$vname.part, ".
"cannot save swapout state!\n";
unlink("$infodir/$vname.sig");
next;
}
print FD "DISK=$disk ";
print FD "LOADPART=$lpart ";
print FD "BOOTPART=$part\n";
close(FD);
}
}
# _Always_ make sure that this 1 is at the end of the file...
1;
......@@ -40,7 +40,7 @@ my $projroot = PROJROOT();
my $grouproot= GROUPROOT();
my $tbdata = "tbdata";
my @dirlist = ($tbdata, "bin", "tmp", "logs", "tftpboot");
my @dirlist = ($tbdata, "bin", "tmp", "logs", "tftpboot", "swapinfo");
my $exitval;
#
......
......@@ -43,7 +43,7 @@ my $GRPROOT = "/groups";
my $TFTPROOT = "/tftpboot";
my $CVSREPOS = "$PROJROOT/cvsrepos";
my @DIRLIST = ("exp", "images", "logs", "deltas", "tarfiles", "rpms",
"groups", "tiplogs");
"groups", "tiplogs", "images/sigs");
my $projhead;
#
......
......@@ -251,6 +251,7 @@ $osloadargs{'nodelist'} = [ @nodes ];
# No imageid means to load the default image.
$osloadargs{'imageid'} = $imageid
if (defined($imageid));
$osloadargs{'swapinfo'} = 1;
exit(osload(\%osloadargs, \%failednodes));
......
......@@ -26,7 +26,7 @@ require 'ctime.pl';
sub usage()
{
print STDERR "Usage: os_setup <pid> <eid>\n";
print STDERR "Usage: os_setup [-d] <pid> <eid>\n";
exit(-1);
}
my $optlist = "d";
......@@ -358,7 +358,8 @@ while (my %row = $db_result->fetchhash()) {
#
my $p_result =
DBQueryFatal("select * from partitions ".
"where node_id='$node' and osid='$osid'");
"where node_id='$node' and osid='$osid'".
"order by partition");
#
# If not loaded, then see if the user was looking for the generic
......@@ -1056,6 +1057,20 @@ print "*** There were $failedvnodes failed virtual nodes\n"
print "*** There were $failedplab failed plab nodes\n"
if ($failedplab);
#
# If not failing for any reason, save off swap state.
#
# For all nodes in the experiment that are booting from the disk,
# figure out the image from which they are booting and stash away the
# appropriate info to enable disk state saving at swapout.
#
my $swapstate;
if (!($failedvnodes || $canceled || $noretry || $failed || $failedplab) &&
TBExptGetSwapState($pid, $eid, \$swapstate) && $swapstate) {
TBDebugTimeStamp("Stashing image signatures");
osload_setupswapinfo($pid, $eid);
TBDebugTimeStamp("Finished stashing image signatures");
}
TBDebugTimeStamp("os_setup finished");
# No retry if vnodes failed. Indicates a fatal problem.
......
......@@ -516,32 +516,8 @@ sub doSwapout($) {
if ($type == REAL && !$firewalled) {
TBExptGetSwapoutAction($pid, $eid, \%soaction);
}
if ($soaction{'command'}) {
my @nodes = ExpNodes($pid, $eid, 1);
if (@nodes > 0) {
print STDERR "Performing swapout admin MFS actions.\n";
TBDebugTimeStamp("Performing swapout actions");
my @failed = ();
my %myargs = ();
$myargs{'name'} = "tbswap";
$myargs{'command'} = $soaction{'command'};
if (defined($soaction{'timeout'})) {
$myargs{'timeout'} = $soaction{'timeout'};
}
$myargs{'timestamp'} = 1;
if (TBAdminMfsRunCmd(\%myargs, \@failed, @nodes)) {
if ($soaction{'isfatal'}) {
tberror
"Failed to run '" . $soaction{'command'} .
"' on @failed!";
return 1;
}
tbwarn
"Failed to run '" . $soaction{'command'} .
"' on @failed!";
}
}
if ($soaction{'command'} && doSwapoutAction($pid, $eid, %soaction)) {
return 1;
}
#
......@@ -1124,7 +1100,6 @@ sub doSwapin($) {
tberror "Error waiting for os_setup to finish.";
return 1;
}
TBDebugTimeStamp("os_setup finished");
#
# Okay, start the event system now that we know all the nodes have
......@@ -1629,3 +1604,135 @@ done:
return 0;
}
#
# Monitor the progress of swapout image creation.
# This is a lot like regular image creation: we make sure that progress
# is being made (the image is growing) and abort if not.
#
sub doSwapoutProgress($%)
{
my ($mystate, $status) = @_;
my $perminute = int(60 / $mystate->{'_interval'});