Commit 895a44f6 authored by Leigh Stoller's avatar Leigh Stoller

Largish rework of nfree. Started out that I just wanted to map the

default OSID from the node_types table, to a specific OSID from the
partition table on the actual node. This is to avoid setting the boot
OSID to RHL_STD when the node is released, which causes a boot
failure. Okay, so I added a library routine to do this (yanked out of
os_setup where I did the code originally). This would solve most of
the problems, except where there was no OS loaded that would satisfy
the mapping, in which case the user must have done an os_load, and now
that auto schedules a reload. Anyway, seemed like this should work.
Ha! Mysql locking is downright dumb; all tables used within a lock
region must be locked. nfree was already locking 9 tables, and in
order to call out to library routines (which might use anything) I
would have to lock the world, which is not actually possible anyway.
Why all this locking in nfree in the first place? The idea is that
there is a race between releasing the node from reserved, and cleaning
up all those tables (interfaces, delays, nodes, etc). We don't want to
free a node, and have it get allocated to another experiment before
the cleanup is done, since that would mess up the state of the node.
The solution (albiet a crufty one) was to lock just the reserved table
(which guards against multiple people trying to nfree the same node at
the same time) and switch the reservation out of the pid,eid and into
a holding reservation. This effectively removes the node from the
users control, but keeps it reserved. Then I unlock the reserved
table. With that done, I can clean up all those tables without any
locking, since the node is still reserved. After cleanup, I can either
delete the reservation, or move it to the next reserve or reload
reservation if those were pending. No locking is needed at this point
since single table changes are atomic (and nalloc locks reserved
anyway). Okay, so now we sit back and see if this was a good idea.
parent 7cd86ce9
......@@ -59,6 +59,10 @@ use Exporter;
TB_DEFAULT_RELOADTYPE TB_RELOADTYPE_FRISBEE TB_RELOADTYPE_NETDISK
TB_EXPTPRIORITY_LOW TB_EXPTPRIORITY_HIGH
TB_ASSIGN_TOOFEWNODES
TBAdmin TBProjAccessCheck TBNodeAccessCheck TBOSIDAccessCheck
TBImageIDAccessCheck TBExptAccessCheck ExpLeader MarkNodeDown
SetNodeBootStatus OSFeatureSupported IsShelved NodeidToExp
......@@ -66,7 +70,7 @@ use Exporter;
DBQuoteSpecial UNIX2DBUID ExpState SetExpState ProjLeader
ExpNodes DBDateTime DefaultImageID GroupLeader TBGroupUnixInfo
TBValidNodeLogType TBValidNodeName TBSetNodeLogEntry
TBSetSchedReload
TBSetSchedReload MapNodeOSID
);
# Must come after package declaration!
......@@ -200,6 +204,13 @@ sub TB_RELOADTYPE_NETDISK() { "netdisk"; }
sub TB_RELOADTYPE_FRISBEE() { "frisbee"; }
sub TB_DEFAULT_RELOADTYPE() { TB_RELOADTYPE_NETDISK; }
# Experiment priorities.
sub TB_EXPTPRIORITY_LOW() { 0; }
sub TB_EXPTPRIORITY_HIGH() { 20; }
# Assign exit status for too few nodes.
sub TB_ASSIGN_TOOFEWNODES() { 2; }
#
# We should list all of the DB limits.
#
......@@ -806,7 +817,20 @@ sub ExpNodes($$)
return ();
}
while (@row = $query_result->fetchrow_array()) {
push(@nodes, $row[0]);
$node = $row[0];
#
# Taint check. I do not understand this sillyness, but if I
# taint check these node names, I avoid warnings throughout.
#
if ($node =~ /^([\w]+)$/) {
$node = $1;
push(@nodes, $node);
}
else {
print "*** $0: WARNING: Bad node name: $node.\n";
}
}
return @nodes;
}
......@@ -1262,5 +1286,72 @@ sub MapNumericUID($)
return $name;
}
#
# Map a generic OSID to a specific OSID for the actual node in question.
# The intent is that, for example, RHL-STD needs to be mapped to the
# specific version of RHL that is loaded on the machine. This bit of code
# does that mapping, return 0 if no mapping could be made.
#
# usage: MapNodeOSID(char *node, char *osid)
# Return the new osid if mapping successful (or actual osid loaded).
# Return 0 for all errors and if mapping not possible.
#
sub MapNodeOSID($$)
{
my ($node, $osid) = @_;
#
# See id this this OSID is actually loaded on the machine.
#
my $p_result =
DBQueryWarn("select * from partitions ".
"where node_id='$node' and osid='$osid'");
if (!$p_result) {
return 0;
}
if ($p_result->numrows) {
return $osid;
}
#
# Get OSID info.
#
my $osid_result =
DBQueryWarn("select * from os_info where osid='$osid'");
if (!$osid_result || $osid_result->numrows == 0) {
return 0;
}
my %osid_row = $osid_result->fetchhash();
#
# If its a specific Version, and its not loaded on the machine,
# nothing to do.
#
if (defined($osid_row{'version'}) && $osid_row{'version'} ne "") {
return 0;
}
#
# Try to map from a generic name to the specific name of the OS
# that *is* loaded.
#
my $o_result =
DBQueryWarn("select o1.* from os_info as o1 ".
"left join partitions as p on o1.osid=p.osid ".
"left join os_info as o2 on o2.OS=o1.OS ".
"where p.node_id='$node' and o2.osid='$osid'");
if (!$o_result || $o_result->numrows == 0) {
return 0;
}
my %o_row = $o_result->fetchhash();
my $n_osid = $o_row{'osid'};
return $n_osid;
}
1;
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment