1. 07 Nov, 2018 1 commit
  2. 10 Aug, 2016 1 commit
    • Mike Hibler's avatar
      Rejiggered reload_daemon to enforce a max time. · b6d272a2
      Mike Hibler authored
      There are now some sitevars to control its behavior, the one of interest here
      is reload/failtime:
      The way the reload daemon is supposed to work now is that nodes will be
      started on their reloading adventure with an os_load. If they are still there
      after reload/retrytime minutes, then they will either be rebooted (if the
      os_load was successful) or os_load'ed again (if the first os_load failed
      outright). The logic for either of these is that there might have been some
      transient condition that caused the failure. If we do have to perform this
      "retry" then we will send email to testbed-ops if reload/warnonretry is set.
      If, after another reload/retrytime minutes, a node is still there, then the
      node will be sent to hwdown, possibly powering it off or booting it into the
      admin MFS depending on the setting of reload/hwdownaction.
      So really, reload/failtime should not be needed. All node should exit
      reloading in 2 * reload/retrytime minutes. But it is there as a backstop
      (and because I didn't understand the logic of the reload daemon at first!)
      Well, it also comes into play if the reload daemon is restarted after being
      down for a long period of time. In this case, all nodes in reloading will
      get moved to hwdown. May need to reconsider this...
  3. 30 Nov, 2015 1 commit
  4. 24 Jun, 2015 1 commit
    • Mike Hibler's avatar
      Updates for new FreeBSD 10.1 based servers. · 480fdc70
      Mike Hibler authored
      Big changes a comin' to try to get us back on the supported path.
       * perl 5.14 -> 5.20
       * mysql 5.1 -> 5.5
       * php 5.4   -> 5.6
       * tcl 8.4   -> 8.6
       * number of vim patches up to 683.
      Not everything tested yet, but getting there.
      Specific changes:
       * New install/ports directory. New packages for FreeBSD 10.1 are version
         6.1. Cleaned up the ports' Makefiles getting rid of conditionals for
         all older versions. Also got rid of ports we don't use. Old ports tree
         is now install/oports.
       * Install script changes. Make sure /usr/bin/perl and /usr/local/bin/python
         links exist. Ports no longer make these but we use them in '#!'. Changes
         to mysql install and startup script--mysql has changed a LOT since we did
         the support in 4.x. Create syslog entry for named.log. Make sure php.conf
         loads the legacy "mysql" module rather than using "mysqli".
       * Elabinelab support. reflect new packages, remove all old packages
         (except perl) before installing new versions, install "extras" package,
         make sure sendmail cert get regenerated, make sure /usr/bin/perl link
         exists, make sure /usr/local/bin/python link exists.
       * Custom ports. otcl and xerces-c2 have both been removed from the ports
         tree as of Q2 2015. ipmitool-devel is a port for the latest version of
         ipmitool. The FreeBSD port is still a rev behind here. We need the
         newer version as it appears to make our SOL consoles more stable.
       * Random. Fixed prerender as neato output has changed again. Tweak to
         sslxmlrpc_server to reflect change in an underlying library. Tweak to
         db/libdb.py.in to turn on autocommit which matters now as mysql 5.5 will
         hang on a metadata lock otherwise. Remade eventsys perl/python stubs
         with SWIG 2.0. SWIG 1.3 did not produce working stubs for perl 5.20.
      Specific un-changes:
       * Apache is still at 2.2. I lack the guts and skilz to upgrade to 2.4.
       * Xerces library is still at (now unsupported) 2.8. Assign will need
         changes before we can move to 3.x.
       * Python is still 2.7.
      Thanks to Keith Sklower for all the work he did converting ports!
  5. 19 Nov, 2014 1 commit
  6. 11 Nov, 2014 3 commits
    • Kirk Webb's avatar
      Ugh - fix my recent fix. · cc4d9597
      Kirk Webb authored
    • Kirk Webb's avatar
      Fix previous commit. · 473aeb2e
      Kirk Webb authored
    • Kirk Webb's avatar
      More TaintState management updates. · d24df9d2
      Kirk Webb authored
      * Do not "reset" taint states to match partitions after OS load.
      Encumber node with any additional taint states found across the
      OSes loaded on a node's partitions (union of states).  Change the
      name of the associated Node object method to better represent the
      * Clear all taint states when a node exits "reloading"
      When the reload_daemon is finished with a node and ready to release it,
      it will now clear any/all taint states set on the node.  This is the
      only automatic way to have a node's taint states cleared.  Users
      cannot clear node taint states by os_load'ing away all tainted
      partitions after this commit; nodes must travel through reloading
      to get cleared.
  7. 17 Jun, 2014 1 commit
  8. 16 Jun, 2014 1 commit
    • Mike Hibler's avatar
      Call libosload directly rather than invoking os_load script. · 361c7d7f
      Mike Hibler authored
      This is not so much for efficiency but because it gives us more precise
      knowledge about failures. Previously, if one node in the batch sent to
      os_load failed, we didn't know which one so we had to assume all failed
      and go back and reload them again. Granted, this situation was one that
      "should not happen", but it does happen quite a lot, at least now when
      we have flaky (IPMI) power control.
      Also, brought some uniformity to the messages printed out; ie., print
      a freakin timestamp already!
  9. 24 Sep, 2012 1 commit
    • Eric Eide's avatar
      Replace license symbols with {{{ }}}-enclosed license blocks. · 6df609a9
      Eric Eide authored
      This commit is intended to makes the license status of Emulab and
      ProtoGENI source files more clear.  It replaces license symbols like
      "EMULAB-COPYRIGHT" and "GENIPUBLIC-COPYRIGHT" with {{{ }}}-delimited
      blocks that contain actual license statements.
      This change was driven by the fact that today, most people acquire and
      track Emulab and ProtoGENI sources via git.
      Before the Emulab source code was kept in git, the Flux Research Group
      at the University of Utah would roll distributions by making tar
      files.  As part of that process, the Flux Group would replace the
      license symbols in the source files with actual license statements.
      When the Flux Group moved to git, people outside of the group started
      to see the source files with the "unexpanded" symbols.  This meant
      that people acquired source files without actual license statements in
      them.  All the relevant files had Utah *copyright* statements in them,
      but without the expanded *license* statements, the licensing status of
      the source files was unclear.
      This commit is intended to clear up that confusion.
      Most Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the Affero GNU General Public License, version 3
      Most Utah-copyrighted files related to ProtoGENI are distributed under
      the terms of the GENI Public License, which is a BSD-like open-source
      Some Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the GNU Lesser General Public License, version 2.1
  10. 23 Jul, 2012 1 commit
  11. 17 Aug, 2011 1 commit
  12. 22 Apr, 2011 1 commit
  13. 13 Jan, 2011 1 commit
  14. 14 Dec, 2010 1 commit
    • David Johnson's avatar
      Make it possible to run multiple reload_daemons. · a48c6905
      David Johnson authored
      You can now run multiple reload_daemons by setting an optional
      tag on the command line.  The default reload daemon is tagless,
      and only looks for nodes in the reloadpending or reloading experiments
      that are untagged.
      You can tag node_types or nodes by adding a node_type_attribute or
      node_attribute with name reload_daemon_pool; the value should match
      whatever tag you gave your reload_daemon on the command line; the
      reload_daemon will only pick up and operate on matching nodes.
      The default reload_daemon will not pick up nodes or node_types that are
      node_attributes override node_type_attributes as always.
  15. 13 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Remove taint mode from some daemons. · 605a4bd1
      Mike Hibler authored
      That change I made to EmulabConstants.pm.in only worked around one instance
      of the problem.  Apparently in perl 5.10 there is a known bug related to
      taint mode and self loading?  Anyway, the short-term fix is either to move
      to perl 5.12 (no thanks) or disable taint checking failures when we hit the
  16. 10 May, 2010 1 commit
  17. 15 Jul, 2009 1 commit
  18. 13 Mar, 2007 1 commit
  19. 18 Jul, 2006 1 commit
    • Leigh Stoller's avatar
      Changes necessary for moving most of the stuff in the node_types · 624a0364
      Leigh Stoller authored
      table, into a new table called node_type_attributes, which is intended
      to be a more extensible way of describing nodes.
      The only things left in the node_types table will be type,class and the
      various isXXX boolean flags, since we use those in numerous joins all over
      the system (ie: when discriminating amongst nodes).
      For the most part, all of that other stuff is rarely used, or used in
      contexts where the information is needed, but not for type descrimination.
      Still, it made for a lot of queries to change!
      Along the way I added a NodeType library module that represents the type
      info as a perl object. I also beefed up the existing Node module, and
      started using it in more places. I also added an Interfaces module, but I
      have not done much with that yet.
      I have not yet removed all the slots from the node_types table; I plan to
      run the new code for a few days and then remove the slots.
      Example using the new NodeType object:
      	use NodeType;
      	my $typeinfo = NodeType->Lookup($type);
              if ($typeinfo->control_interface(\$control_iface) ||
                  !$control_iface) {
        	    warn "No control interface for $type is defined in the DB!\n";
      or using the Node:
      	use Node;
              my $nodeobject = Node->Lookup($node_id);
              my $imageable  = $nodeobject->NodeTypeInfo()->imageable();
              my $rebootable = $nodeobject->isrebootable();
              $nodeobject->NodeTypeAttribute("control_interface", \$control_iface);
      Lots of way to accomplish the same thing, but the main point is that the
      Node is able to override the NodeType (if it wants to), which I think is
      necessary for flexibly describing one/two of a kind things like switches, etc.
  20. 25 Apr, 2006 1 commit
  21. 07 Feb, 2006 1 commit
  22. 13 Jun, 2005 1 commit
    • Timothy Stack's avatar
      · 5e43a771
      Timothy Stack authored
      Initial checkin of a "repositioning" daemon that moves robots back to
      their pens on swapout.
      	* configure, configure.in: Add tbsetup/repos_daemon.
      	* db/libdb.pm.in: Add constants for the
      	repositionpending/repositioning experiments.
      	* db/nfree.in: When freeing garcias, send them to
      	repositionpending instead of reloadpending.
      	* event/sched/event-sched.c: Deal with the rare case of no
      	SIMULATOR object being in the agent list for an experiment.
      	* robots/emc/emcd.c, robots/emc/locpiper.in: Fix some typos.
      	* robots/rmcd/masterController.h, robots/rmcd/masterController.c,
      	robots/rmcd/obstacles.h, robots/rmcd/obstacles.c: Ignore dynamic
      	obstacles that are far away and remove dynamic obstacles where the
      	robot is inside the natural obstacle area.
      	* sql/database-create.sql, sql/database-migrate.txt: Add a
      	reposition_status table that tracks the status of robots that are
      	being moved back to their pens.
      	* tbsetup/GNUmakefile.in: Install the repos_daemon script.
      	* tbsetup/reload_daemon.in: Move robots to the repositionpending
      	experiment, if they haven't already reached their pen.
      	* tbsetup/repos_daemon.in: Daemon that takes care of seeing robots
      	back to their pens after they are freed from an experiment.
  23. 31 May, 2005 1 commit
  24. 14 Apr, 2005 1 commit
    • Mike Hibler's avatar
      Changes to respect the "imageable" field of the node_types table: · 65a39cd5
      Mike Hibler authored
       - os_load will not attempt to load a non-imagable node, it will
         be skipped without error
       - nfree will respect imageable even with an entry in scheduled_reloads
       - reload_daemon will free any non-imageable nodes that happen to make
         it into reloadpending/reloading
  25. 17 Mar, 2005 1 commit
    • Mike Hibler's avatar
      Partial support for disk-zeroing on experiment termination. · 60e7adb8
      Mike Hibler authored
      I did the "back half" support.  If the 'mustwipe' field is non-zero
      in the reserved table entry for a node then its disk must be zeroed.
      How the zeroing is done, depends on the value of the mustwipe field.
      Right now, '1' means pass the '-z' option to frisbee to have it zero
      all non-allocated blocks.  The value '2' is reserved for enabling a
      "full wipe" pass of the disk before running frisbee, which Keith Sklower
      (DETER) wanted to be able to do.  Note that 1 and 2 are effectively the
      same, if we are loading a full-disk image; i.e. all non-allocated blocks
      from the new image are zeroed.  But if the disk were being loaded with
      a single-partition image, then "frisbee -z" would only wipe unused
      blocks in that partition.
      The reload_daemon has been modified to extract the mustwipe info and
      invoke os_load accordingly.   os_load now takes a "-z <type>" option
      to enable the zeroing by setting a value in the current_reloads table.
      tmcd will read and return that info to its caller in the "loadinfo" command.
      Finally, the rc.frisbee script that runs in the frisbee MFS extracts the
      loadinfo info and crafts the frisbee startup command.
      What still needs to be done is the "front end," how the user specifies
      the value and how it winds up in the DB reserved table.  This will probably
      involve addition of state to the experiments table as this will likely be
      a per-experiment setting.
  26. 14 Feb, 2005 1 commit
    • Kirk Webb's avatar
      · f239ba2b
      Kirk Webb authored
      Garcia hack refined in reload_daemon to free the garcias upon reload.  This
      task is normally handled by stated for regular nodes.  In the case of the
      garcias, however, no RELOADDONE state transition happens, so we just free
      the node up directly.  The code to free them was stolen from stated.
  27. 11 Feb, 2005 1 commit
    • Kirk Webb's avatar
      · c10ee252
      Kirk Webb authored
      Added garcia reload hack to reload_daemon.  Verified that it works on
      an inner elab.  The change to nfree is minor - push garcias into
      reloadpending even though they are not (yet) imageable.
  28. 12 Jan, 2005 1 commit
  29. 19 Nov, 2003 1 commit
  30. 06 Nov, 2003 1 commit
    • Leigh Stoller's avatar
      Prevent reload_daemon from exiting. · 33e45640
      Leigh Stoller authored
      * If a reboot stuck node fails, move the node to hwdown, send email,
        and log an entry in the nodelog. Then continue on.
      * If os_load fails, record the nodes that failed, and try again if the
        nodes fail to reload at the retry interval. Do not exit. I was going
        to call os_load again immediately, but decided not to since these
        changes were quite easy.
        The above change not really tested ... waiting for os_load to fail!
  31. 15 Sep, 2003 1 commit
  32. 25 Mar, 2003 1 commit
  33. 22 Mar, 2003 1 commit
    • Mac Newbold's avatar
      Grab a batch at a time instead of a single node per loop iteration. · 4a34327a
      Mac Newbold authored
      Scaling and speed now depends primarily on os_load (and indirectly,
      node_reboot). The time a batch spends in the reload_daemon code appears to
      be <1s per node now, instead of taking 30s per node to grab, setup, and
      Also, finally remove the "obsolete section" that's been sitting in there
      for a long time. This was the part that did netdisk reloads, and has
      already been neutered out of the code path for several months at least.
  34. 31 Jan, 2003 1 commit
  35. 30 Jan, 2003 1 commit
  36. 29 Jan, 2003 1 commit
  37. 18 Dec, 2002 1 commit
  38. 16 Dec, 2002 1 commit
    • Mac Newbold's avatar
      Decrease the sleep between loops from 2 to 1, and fix a typo. This should · 6bdba92c
      Mac Newbold authored
      help nodes in reload_pending get sucked into reloading faster. If it
      doesn't do enough, we'll need to do more batching of stuff, so we get some
      parallelism in os_load instead of forcing it to serialize by calling
      os_load one node at a time.
      I was tempted to nuke all the stuff that was in there from the netdisk
      reload type, but decided not to. It won't be too long (relatively
      speaking) before we have freed, the new "free node manager" that will
      replace/supersede our current reload_daemon anyway.