Commit d5bf0bdb authored by Mac Newbold's avatar Mac Newbold

Several small but much-needed changes to idle detection. Soon I'll

give this stuff a major overhaul when we move to the model where we
have data on when each node was last "touched" or actively used. (Most
of these changes will still be relevant.)

1. Fix a bug in idlecheck that we didn't really thing much about. It
turns out that the WanSpread people have had a 17 node expt idle for
over three weeks, we didn't detect it because the nodes were running
ospf and generating lots of network traffic on the exptl. net. We now
ignore the exptl. network traffic as a source of activity if they have
automatic ospf routing happening. We also ignore nodes that have any
trafgen endpoints.

2. After the changes in idlecheck, I updated idle view to correctly
handle a new flag idlecheck outputs to let us know that an expt has
ospf running and may be falsely inactive because we ignored net
traffic. It will probably be very rare that an active expt that is
running ospf will have only network activity and no tty activity, but
it is a potential source of false positives. I also did some random
libifying. There were some hard coded references to emulab.net that I
fixed to properly use the variables from the defs file.

3. I also updated request_swapexp.php3. It now inlcudes in the email
message a blurb about automatic swapping, depending on whether or not
they're marked swappable. (If swappable, it says "This experiment is
marked as swappable, so it may be automatically swapped out by
Emulab.Net or its operational staff." and if unswappable, it says
"This experiment has not been marked swappable, so it will not be
automatically swapped out.") It also has a reference to the Node Usage
Policy and gives the URL. So we now give them fair warning about
potentially getting swapped out and what our policies are.
parent 3a643160
......@@ -39,6 +39,7 @@ Currently the following qualify as activity:
* Packets sent/received on the experimental network
* Use of a tty
Activity does not currently include:
* Network traffic for nodes using automatic routing or trafgen daemons
* Computation (non-idle load averages)
* Control net traffic
* Control without the use of a tty\n"
......@@ -79,6 +80,7 @@ my $stalemin = $f || 120; # Max minutes of staleness for latest report
my $minpkts = $idlehours * $minpph;
my $idlesec = $idlehours * 3600;
my $stalesec = $stalemin * 60;
my $rtype="ospf";
my $node1="";
my $node2="";
my $node3="";
......@@ -89,6 +91,7 @@ if ($n) {
}
my %active=();
my %fresh=();
my %router=();
# This query finds how many packets the non-control net interfaces
# have sent in the last $idlesec seconds, and saves it in a temporary
......@@ -105,20 +108,28 @@ my %fresh=();
for my $cmd ("drop table if exists idletemp;",
"create temporary table idletemp
select r.pid,r.eid, $node1
select r.pid,r.eid, routertype as router, $node1
max(ipkts)-min(ipkts) as idiff , max(opkts)-min(opkts) as odiff
from iface_counters as a
left join reserved as r on a.node_id = r.node_id
left join nodes as n on a.node_id=n.node_id
left join virt_trafgens as v on r.vname=v.vnode
left join interfaces as i on a.mac=i.mac
left join experiments as e on e.pid=r.pid and e.eid=r.eid
where tstamp >= expt_swapped
and v.vnode is null
and (unix_timestamp(now())-unix_timestamp(tstamp) <= $idlesec)
and (unix_timestamp(now())-unix_timestamp(expt_swapped) >= $idlesec)
and IP not like \"155.101.%\" $node2
and (routertype=\"$rtype\" or i.IP not like \"155.101.%\") $node2
group by r.pid,r.eid,a.mac;",
"select pid,eid, max(idiff), max(odiff) from idletemp
group by pid,eid $node3
having (max(idiff) >= $minpkts) or (max(odiff) >= $minpkts);",
# Ordering matters here! Make sure that the router query comes
# _after_ the idletemp query and _before_ all the other select queries!
"select pid, eid, router from idletemp
where router=\"$rtype\"
group by pid,eid ,router;",
"select r.pid,r.eid,max(last_tty) as lastuse ,max(tstamp) as t
from node_idlestats as n
left join reserved as r on n.node_id=r.node_id
......@@ -128,15 +139,21 @@ group by pid,eid
having (unix_timestamp(now())-unix_timestamp(lastuse) <= $idlesec)
order by pid,eid,last_tty,tstamp;") {
print "Sending cmd:\n$cmd\n" if $d;
print "Sending cmd:\n$cmd\n" if $d>1;
my $result = DBQueryFatal($cmd);
if ($cmd =~ /^select /i && $result->numrows() > 0) {
# Add the pid/eid to our list of active expts
while(@r=$result->fetchrow()) {
$pid=$r[0];
$eid=$r[1];
print "Adding $pid/$eid to active list\n" if $d;
$active{"$pid/$eid"} = 1;
if (!($cmd =~ /where router/i)) {
print "Adding $pid/$eid to active list\n" if $d;
$active{"$pid/$eid"} = 1;
} else {
print "Removing $pid/$eid from active list - it has $rtype routers!\n" if $d;
$active{"$pid/$eid"} = 0;
$router{"$pid/$eid"} = 1;
}
}
}
print $result->as_string() if ($d>1);
......@@ -152,7 +169,7 @@ and r.node_id not like \"%ron%\"
group by pid,eid
having t is not null and (unix_timestamp(now())-unix_timestamp(t)<=$stalesec)
order by pid,eid";
print "Sending cmd:\n$cmd\n" if $d;
print "Sending cmd:\n$cmd\n" if $d>1;
my $result = DBQueryFatal($cmd);
while(@r=$result->fetchrow()) {
$pid=$r[0];
......@@ -168,23 +185,25 @@ $cmd = "select r.pid,r.eid,swappable,expt_swapped from reserved as r
left join experiments as e on e.pid=r.pid and e.eid=r.eid
where (unix_timestamp(now())-unix_timestamp(expt_swapped) >= $idlesec)
and idle_ignore=0 group by r.pid,r.eid order by r.pid,r.eid";
print "Sending cmd:\n$cmd\n" if $d;
print "Sending cmd:\n$cmd\n" if $d>1;
$result = DBQueryFatal($cmd);
while(@r=$result->fetchrow()) {
$pid=$r[0];
$eid=$r[1];
$swap=$r[2];
$idle=!defined($active{"$pid/$eid"});
$stale=!defined($fresh{"$pid/$eid"});
$idle=!(defined($active{"$pid/$eid"}) && $active{"$pid/$eid"});
$router=defined($active{"$pid/$eid"}) && $router{"$pid/$eid"};
$stale=!(defined($fresh{"$pid/$eid"}) && $fresh{"$pid/$eid"});
print "Checking for $pid/$eid in active list\n" if $d;
# Now output the results
# IMPORTANT: If you make changes to output format, be sure to update
# testbed/www/showexp_list.php3 as well, since it reads this output
my $str= "$pid/$eid";
$str = $str . " " x (36-length($str))." ";
$str = $str . " " x (28-length($str))." ";
$str .= ($idle? "inactive\t" : "\t\t" );
$str .= ($stale?"stale\t" : "\t" );
$str .= (!$swap? "unswappable\n" : "\n" );
$str .= (!$swap? "unswappable\t" : "\t\t" );
$str .= ($router? "$rtype\n" : "\n" );
if (($idle && !$stale && $swap) ||
($stale && $s) ||
(!$swap && $u)) { print $str; }
......@@ -195,7 +214,7 @@ print $result->as_string() if ($d>1);
if (@list > 0) {
$cmd = "update experiments set swap_requests=0 where not (".
join(" or ",@list).")";
print "Sending cmd:\n$cmd\n" if $d;
print "Sending cmd:\n$cmd\n" if $d>1;
DBQueryWarn($cmd);
}
......
......@@ -63,11 +63,12 @@ if ($canceled) {
# We only send to the proj leader after we've sent $tell_proj_head requests
$tell_proj_head = 2;
$q = DBQueryWarn("select swap_requests,
$q = DBQueryWarn("select swappable, swap_requests,
date_format(last_swap_req,\"%T\") as lastreq
from experiments
where eid='$eid' and pid='$pid'");
$r = mysql_fetch_array($q);
$swappable = $r["swappable"];
$swap_requests = $r["swap_requests"];
$last_swap_req = $r["lastreq"];
......@@ -128,25 +129,34 @@ TBUserInfo($projleader, $projleader_name, $projleader_email);
TBMAIL("$expleader_name <$expleader_email>",
"$pid/$eid: Please Swap or Terminate Experiment",
"Hi, this is an automated message from Emulab.Net.\n".
"Hi, this is an automated message from $THISHOMEBASE.\n".
( $swap_requests > 0
? ("You have been sent ".$swap_requests." other message".
($swap_requests!=1?"s":"")." since this ".
"experiment became idle.\n")
: "") .
: "").
($swappable ?
("This experiment is marked as swappable, so it may be ".
"automatically \nswapped out by $THISHOMEBASE or its ".
"operational staff.\n") :
("This experiment has not been marked swappable, so it will not be\n".
"automatically swapped out.\n")).
"\n".
"It appears that the $c node".($c!=1?"s":"").
" in your experiment '$eid' \n".
"in project '$pid' ".($c!=1?"are":"is")." inactive.\n".
"We would appreciate it if you could either terminate or swap this\n".
"experiment out so that the nodes will be available for use by\n".
"other experimenters. You can do this by logging into the Emulab Web\n".
"Interface, and using the swap or terminate links on this page:\n".
"\n".
"other experimenters. You can do this by logging into the $THISHOMEBASE".
" Web\nInterface, and using the swap or terminate links on this page:".
"\n\n".
" $TBBASE/showexp.php3?pid=$pid&eid=$eid\n".
"\n".
"More information on experiment swapping is available in the Emulab\n".
"FAQ at http://www.emulab.net/faq.php3#UTT-Swapping\n".
"FAQ at $TBDOCBASE/faq.php3#UTT-Swapping\n".
"\n".
"More information on our node usage policy is available at\n".
"$TBDOCBASE/docwrapper.php3?docname=swapping.html\n".
"\n".
"If you feel this message is in error then please contact\n".
"$TBMAILADDR_OPS.\n".
......
......@@ -124,6 +124,10 @@ if ($isadmin) {
if ($tag == "inactive") { $inactive[$expt]=1; }
elseif ($tag == "stale") { $stale[$expt]=1; }
elseif ($tag == "unswappable") { $unswap[$expt]=1; }
else {
if (defined($other[$expt])) { $other[$expt].=$tag; }
else {$other[$expt]=$tag; }
}
}
}
......@@ -272,6 +276,7 @@ if (mysql_num_rows($experiments_result)) {
if ($stale[$expt]==1) { $str .= "<b>no&nbsp;report</b> "; }
# For now, don't show this tag, it's redundant
#if ($unswap[$expt]==1) { $str .= "unswappable"; }
if ($other[$expt]) { $str .= " ($other[$expt]) "; }
if ($str=="") { $str="&nbsp;"; }
# sanity check
$slothderr=0;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment