1. 14 Dec, 2010 1 commit
    • David Johnson's avatar
      Make it possible to run multiple reload_daemons. · a48c6905
      David Johnson authored
      You can now run multiple reload_daemons by setting an optional
      tag on the command line.  The default reload daemon is tagless,
      and only looks for nodes in the reloadpending or reloading experiments
      that are untagged.
      
      You can tag node_types or nodes by adding a node_type_attribute or
      node_attribute with name reload_daemon_pool; the value should match
      whatever tag you gave your reload_daemon on the command line; the
      reload_daemon will only pick up and operate on matching nodes.
      
      The default reload_daemon will not pick up nodes or node_types that are
      tagged.
      
      node_attributes override node_type_attributes as always.
      a48c6905
  2. 13 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Remove taint mode from some daemons. · 605a4bd1
      Mike Hibler authored
      That change I made to EmulabConstants.pm.in only worked around one instance
      of the problem.  Apparently in perl 5.10 there is a known bug related to
      taint mode and self loading?  Anyway, the short-term fix is either to move
      to perl 5.12 (no thanks) or disable taint checking failures when we hit the
      problem.
      605a4bd1
  3. 10 May, 2010 1 commit
  4. 15 Jul, 2009 1 commit
  5. 13 Mar, 2007 1 commit
  6. 18 Jul, 2006 1 commit
    • Leigh B. Stoller's avatar
      Changes necessary for moving most of the stuff in the node_types · 624a0364
      Leigh B. Stoller authored
      table, into a new table called node_type_attributes, which is intended
      to be a more extensible way of describing nodes.
      
      The only things left in the node_types table will be type,class and the
      various isXXX boolean flags, since we use those in numerous joins all over
      the system (ie: when discriminating amongst nodes).
      
      For the most part, all of that other stuff is rarely used, or used in
      contexts where the information is needed, but not for type descrimination.
      Still, it made for a lot of queries to change!
      
      Along the way I added a NodeType library module that represents the type
      info as a perl object. I also beefed up the existing Node module, and
      started using it in more places. I also added an Interfaces module, but I
      have not done much with that yet.
      
      I have not yet removed all the slots from the node_types table; I plan to
      run the new code for a few days and then remove the slots.
      
      Example using the new NodeType object:
      
      	use NodeType;
      
      	my $typeinfo = NodeType->Lookup($type);
      
              if ($typeinfo->control_interface(\$control_iface) ||
                  !$control_iface) {
        	    warn "No control interface for $type is defined in the DB!\n";
              }
      
      or using the Node:
      
      	use Node;
      
              my $nodeobject = Node->Lookup($node_id);
              my $imageable  = $nodeobject->NodeTypeInfo()->imageable();
      or
              my $rebootable = $nodeobject->isrebootable();
      or
              $nodeobject->NodeTypeAttribute("control_interface", \$control_iface);
      
      Lots of way to accomplish the same thing, but the main point is that the
      Node is able to override the NodeType (if it wants to), which I think is
      necessary for flexibly describing one/two of a kind things like switches, etc.
      624a0364
  7. 25 Apr, 2006 1 commit
  8. 07 Feb, 2006 1 commit
  9. 13 Jun, 2005 1 commit
    • Timothy Stack's avatar
      · 5e43a771
      Timothy Stack authored
      Initial checkin of a "repositioning" daemon that moves robots back to
      their pens on swapout.
      
      	* configure, configure.in: Add tbsetup/repos_daemon.
      
      	* db/libdb.pm.in: Add constants for the
      	repositionpending/repositioning experiments.
      
      	* db/nfree.in: When freeing garcias, send them to
      	repositionpending instead of reloadpending.
      
      	* event/sched/event-sched.c: Deal with the rare case of no
      	SIMULATOR object being in the agent list for an experiment.
      
      	* robots/emc/emcd.c, robots/emc/locpiper.in: Fix some typos.
      
      	* robots/rmcd/masterController.h, robots/rmcd/masterController.c,
      	robots/rmcd/obstacles.h, robots/rmcd/obstacles.c: Ignore dynamic
      	obstacles that are far away and remove dynamic obstacles where the
      	robot is inside the natural obstacle area.
      
      	* sql/database-create.sql, sql/database-migrate.txt: Add a
      	reposition_status table that tracks the status of robots that are
      	being moved back to their pens.
      
      	* tbsetup/GNUmakefile.in: Install the repos_daemon script.
      
      	* tbsetup/reload_daemon.in: Move robots to the repositionpending
      	experiment, if they haven't already reached their pen.
      
      	* tbsetup/repos_daemon.in: Daemon that takes care of seeing robots
      	back to their pens after they are freed from an experiment.
      5e43a771
  10. 31 May, 2005 1 commit
  11. 14 Apr, 2005 1 commit
    • Mike Hibler's avatar
      Changes to respect the "imageable" field of the node_types table: · 65a39cd5
      Mike Hibler authored
       - os_load will not attempt to load a non-imagable node, it will
         be skipped without error
       - nfree will respect imageable even with an entry in scheduled_reloads
       - reload_daemon will free any non-imageable nodes that happen to make
         it into reloadpending/reloading
      65a39cd5
  12. 17 Mar, 2005 1 commit
    • Mike Hibler's avatar
      Partial support for disk-zeroing on experiment termination. · 60e7adb8
      Mike Hibler authored
      I did the "back half" support.  If the 'mustwipe' field is non-zero
      in the reserved table entry for a node then its disk must be zeroed.
      How the zeroing is done, depends on the value of the mustwipe field.
      Right now, '1' means pass the '-z' option to frisbee to have it zero
      all non-allocated blocks.  The value '2' is reserved for enabling a
      "full wipe" pass of the disk before running frisbee, which Keith Sklower
      (DETER) wanted to be able to do.  Note that 1 and 2 are effectively the
      same, if we are loading a full-disk image; i.e. all non-allocated blocks
      from the new image are zeroed.  But if the disk were being loaded with
      a single-partition image, then "frisbee -z" would only wipe unused
      blocks in that partition.
      
      The reload_daemon has been modified to extract the mustwipe info and
      invoke os_load accordingly.   os_load now takes a "-z <type>" option
      to enable the zeroing by setting a value in the current_reloads table.
      tmcd will read and return that info to its caller in the "loadinfo" command.
      Finally, the rc.frisbee script that runs in the frisbee MFS extracts the
      loadinfo info and crafts the frisbee startup command.
      
      What still needs to be done is the "front end," how the user specifies
      the value and how it winds up in the DB reserved table.  This will probably
      involve addition of state to the experiments table as this will likely be
      a per-experiment setting.
      60e7adb8
  13. 14 Feb, 2005 1 commit
    • Kirk Webb's avatar
      · f239ba2b
      Kirk Webb authored
      Garcia hack refined in reload_daemon to free the garcias upon reload.  This
      task is normally handled by stated for regular nodes.  In the case of the
      garcias, however, no RELOADDONE state transition happens, so we just free
      the node up directly.  The code to free them was stolen from stated.
      f239ba2b
  14. 11 Feb, 2005 1 commit
    • Kirk Webb's avatar
      · c10ee252
      Kirk Webb authored
      Added garcia reload hack to reload_daemon.  Verified that it works on
      an inner elab.  The change to nfree is minor - push garcias into
      reloadpending even though they are not (yet) imageable.
      c10ee252
  15. 12 Jan, 2005 1 commit
  16. 19 Nov, 2003 1 commit
  17. 06 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Prevent reload_daemon from exiting. · 33e45640
      Leigh B. Stoller authored
      * If a reboot stuck node fails, move the node to hwdown, send email,
        and log an entry in the nodelog. Then continue on.
      
      * If os_load fails, record the nodes that failed, and try again if the
        nodes fail to reload at the retry interval. Do not exit. I was going
        to call os_load again immediately, but decided not to since these
        changes were quite easy.
      
        The above change not really tested ... waiting for os_load to fail!
      33e45640
  18. 15 Sep, 2003 1 commit
  19. 25 Mar, 2003 1 commit
  20. 22 Mar, 2003 1 commit
    • Mac Newbold's avatar
      Grab a batch at a time instead of a single node per loop iteration. · 4a34327a
      Mac Newbold authored
      Scaling and speed now depends primarily on os_load (and indirectly,
      node_reboot). The time a batch spends in the reload_daemon code appears to
      be <1s per node now, instead of taking 30s per node to grab, setup, and
      reboot.
      
      Also, finally remove the "obsolete section" that's been sitting in there
      for a long time. This was the part that did netdisk reloads, and has
      already been neutered out of the code path for several months at least.
      4a34327a
  21. 31 Jan, 2003 1 commit
  22. 30 Jan, 2003 1 commit
  23. 29 Jan, 2003 1 commit
  24. 18 Dec, 2002 1 commit
  25. 16 Dec, 2002 1 commit
    • Mac Newbold's avatar
      Decrease the sleep between loops from 2 to 1, and fix a typo. This should · 6bdba92c
      Mac Newbold authored
      help nodes in reload_pending get sucked into reloading faster. If it
      doesn't do enough, we'll need to do more batching of stuff, so we get some
      parallelism in os_load instead of forcing it to serialize by calling
      os_load one node at a time.
      
      I was tempted to nuke all the stuff that was in there from the netdisk
      reload type, but decided not to. It won't be too long (relatively
      speaking) before we have freed, the new "free node manager" that will
      replace/supersede our current reload_daemon anyway.
      6bdba92c
  26. 11 Dec, 2002 1 commit
  27. 04 Nov, 2002 1 commit
  28. 01 Nov, 2002 1 commit
  29. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  30. 07 Jul, 2002 1 commit
  31. 13 May, 2002 1 commit
  32. 12 Feb, 2002 1 commit
  33. 08 Feb, 2002 1 commit
    • Leigh B. Stoller's avatar
      Big round of image/osid changes. This is the first cut (final cut?) at · a73e627e
      Leigh B. Stoller authored
      supporting autocreating and autoloading images. The imageid form now
      sports a field to specify a nodeid to create the image from; If set,
      the backend create_image script is invoked. Thats the easy part.
      Slightly harder is autoloading images based on the osid specified in
      the NS file. To support this, I have added a new DB table called
      osidtoimageid, which holds the mapping from osid/pctype to imageid.
      When users create images, they must specify what node types that image
      is good for. Obviously, the mappings have to be unique or it would be
      impossible to figure it out! Anyway, once that image mapping is
      in place and the image created, the user can specify that ID in the NS
      file. I've changed os_setup to to look for IDs that are not loaded,
      and to try and find one in the osidtoimageid. If found, it invokes
      os_load. To keep things running in parallel as much as possible,
      os_setup issues all the loads/reboots (could be more than a single set
      of loads is multiple IDs are in the NS file) at once, and waits for
      all the children to exit. I've hacked up os_load a bit to try and be
      more robust in the face of PXE failures, which still happen and are
      rather troublsesome. Need an event system!
      
      Contained in this revision are unrelated changed to make the OS and
      Image IDs per-project unique instead of globally unique, since thats a
      pain for the users. This turns out to be very messy, since underneath
      we do not want to pass around pid/ID in all the various places its
      used. Rather, I create a globally unique name and extened the OS and
      Image tables to include pid/name/ID. The user selects pid/name, and I
      create the globally unique ID. For the most part this is invisible
      throughout the system, except where we interface with the user, say in
      the web pages; the user should see his chosen name where possible, and
      the should invoke scripts (os_load, create_image, etc) using his/her
      name not the internal ID. Also, in the front end the NS file should
      use the user name not the ID. All in all, this accounted for a number
      of annoying changes and some special cases that are unavoidable.
      a73e627e
  34. 07 Feb, 2002 1 commit
  35. 14 Jan, 2002 1 commit
    • Leigh B. Stoller's avatar
      Make Frisbee.Redux live: · d08b5e41
      Leigh B. Stoller authored
      * Add appropriate goo to os/GNUMakefile so that Frisbee daemon is
        built and installed.
      
      * Rework the frisbee launcher slightly. Aside from little changes
        (send email to tbops when frisbeed dies, new cmdline syntax to
        frisbeed), allow for frisbeed to exit gracefully after a period of
        inactivity (no client requests for 30 minutes, at present). In order
        to prevent a race condition with a new client being added (and
        rebooted) and frisbeed terminating before the client gets started,
        add a load_busy indicator to the images table (next to load_address
        slot) and set that to one each time to frisbeelauncher is invoked.
        When frisbeed exits, test and clear that bit atomically (lock
        tables) and go around another time (restart frisbeed for another 30
        minute period).
      
      * Rework waitmode in os_load. Wait for all of the nodes to finish at
        once, and track which nodes never finish. Retry those nodes again by
        rebooting. The number of retries is configurable in the script, and
        is currently set to one. This should take care of some PXE boot
        related problems, although obviously not all.
      
      * Got rid of -w option to os_load and made waitmode the default. The
        -s option can be used to start a reload, but not to wait for it to
        complete.
      
      * Minor changes to sched_reload and reload_daemon; pass in -s option
        to os_load.
      d08b5e41
  36. 04 Dec, 2001 1 commit
  37. 27 Nov, 2001 1 commit
  38. 07 Nov, 2001 1 commit
  39. 06 Nov, 2001 1 commit
  40. 05 Nov, 2001 1 commit