1. 14 Feb, 2008 1 commit
  2. 06 Feb, 2008 1 commit
    • David Johnson's avatar
      Add support for including nodes from multiple PLCs in experiments. Right · c44c47c9
      David Johnson authored
      now, this is keyed off nodetype.  Lots of hardcoded constants and config
      stuff moved to attributes in the db.  You can now set per-PLC and
      per-slice attributes, so you can (for instance) use different auth info
      whenever you want.  Experiments can use preexisting slices if somebody
      sets up the db before swapin.  Also, we no longer have to rely on
      slices.xml to sync up nodes/sites with PLC... can use xmlrpc instead.
      
      Lots of code cleanup, improved some abstractions, etc.
      c44c47c9
  3. 21 Sep, 2007 1 commit
  4. 05 Sep, 2007 1 commit
  5. 03 May, 2007 1 commit
  6. 11 Apr, 2007 1 commit
  7. 09 Mar, 2007 1 commit
    • David Johnson's avatar
      These are the rest of the changes that have been accumulating in my dev · 1b6ef602
      David Johnson authored
      tree for v4 planetlab node support.  Currently, we support both v3 and
      v4 NMs via a little wrapper, and we dist out different versions of the
      rootball depending on NM version.  Also updated various parts of libplab
      to log success and failure from interactions with planetlab nodes to the
      db, and there are beginnings of support for that in plabmonitord.in.
      1b6ef602
  8. 12 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · 52dcfd48
      Kirk Webb authored
      Added secondary logging for node setup/teardown success/failure.  Also log
      node pool membership changes in this log.
      52dcfd48
  9. 28 Aug, 2006 1 commit
    • Kirk Webb's avatar
      · 37f4392e
      Kirk Webb authored
      Updates to the plab monitor.  Fixed a couple of bugs and created a
      separate libplabmon library module.
      37f4392e
  10. 21 Aug, 2006 1 commit
    • Kirk Webb's avatar
      · af0d6629
      Kirk Webb authored
      Some bugfixes and updates to the monitor.
      
      * Added load average monitoring and initial test startup randomization
      
      The load the monitor was exerting, especially at startup, was pretty high.
      This change appears to have brought that under control.
      
      * Fixed window size bug(s)
      
      There were a few bugs related to tracking the outstanding child process
      window that are corrected by this checkin.
      af0d6629
  11. 18 Aug, 2006 1 commit
    • Kirk Webb's avatar
      · 2c6ee69a
      Kirk Webb authored
      Duh - balance pool priorities so that one doesn't starve the rest.
      2c6ee69a
  12. 17 Aug, 2006 1 commit
    • Kirk Webb's avatar
      · f1fa5a51
      Kirk Webb authored
      New plab vnode monitor framework, now with proactive node checking action!
      
      The old monitor has been completely replaced.  The new one uses modular pools
      to test and track plab nodes.  There are currently two pool modules:
      good and bad.  THe good pool tests nodes that have are not known to have
      issues to proactively find problems and push nodes into the "bad" pool
      when necessary.  The bad pool acts similarly to the old plabmonitor; it
      does and end to end test on nodes, and if and when they finally come up,
      moves them to the good pool.  Both pools have a testing backoff mechanism
      that works as follows:
      
        * The node is tested right away upon entering either pool
        * Node fails to setup:
          * goodpool: node is sent to bad pool (hwdown)
          * badpool:  node is scheduled to be retested according to
                      an additive backoff function, maxing out at 1 hour.
        * Node setup succeeds:
          * goodpool: node is scheduled to be retested according to
                      an additive backoff function, maxing out at 1 hour.
          * badpool:  node is moved to good pool.
      
      The backoff thing may be bogus, we'll see.  It seems like a reasonable thing
      to do though - no need to hammer a node with tests if it consistently
      succeeds or fails.  Nodes that flop back and forth will get the most
      testing punishment.  A future enhancement will be to watch for flopping
      and force nodes that exhibit this behavior to pass several consecutive
      tests before being eligible for return back into the good pool.
      
      The monitor only allows a configurable window's worth of outstanding
      tests to go on at once.  When tests finish, more nodes tests are allowed
      to start up right away.
      
      Some refactoring needs to be done.  Currently the good and bad pools share
      quite a bit of duplicated code.  I don't know if I dare venture into
      inheritance with perl, but that would be a good way to approach this.
      
      Some other pool module ideas:
      
      * dynamic setup pools
      
      When experiments w/ plab vnodes are swapped in, use the plab monitor to
      manage setting up the vnodes by dynamically creating pools on a per-experiment
      basis.  This has the advantage that the monitor can keep a global cap on
      the number of outstanding setup operations.  These pools might also try to
      bring up vnodes that failed to setup during swapin later on, along with other
      vnode monitoring tasks.
      
      * "all nodes" pools
      
      Similar to the dynamic pools just mentioned, but with the mission to extend
      experiments to all plab nodes possible (as nodes come and go).  Useful for
      services.
      f1fa5a51
  13. 02 Feb, 2006 1 commit
    • Kirk Webb's avatar
      · c808ef0a
      Kirk Webb authored
      Change some timing constants in the plab monitor daemon
      c808ef0a
  14. 15 Dec, 2005 1 commit
    • Kirk Webb's avatar
      · 41c54939
      Kirk Webb authored
      The revived Plab interface is here!
      
      Lots of updates to the plab backend, including improved plab <-> elab node
      id translation and update handling.  Includes support for the current PLC
      API, and the new pl_conf node manager interface API.  Several more db library
      routines were ported from the perl library to the python one to support the
      new code (mostly the node_id tracking stuff).  Fixes to the client side and
      also a rootball creation cleanup (binaries removed from the CVS repo).
      
      There are also enhancements to the experiment view page for experiments
      including plab nodes: site and widearea hostname are now displayed along
      with the other node information.
      
      Note that the way setup timeout for vnodes is calculated has been changed a
      bit.  Instead of using a hardwired base timeout, the base timeout is now
      based on the reload_waittime database field, which comes from the 'OS'
      (e.g., FBSD-JAIL, RHL-PLAB) the vnode runs.
      
      The default max duration for a plab slice created through the plab_ez interface
      is set to 1 year, and linktest is currently disabled and hidden through
      the ez interface.
      
      There is still work to do, but this checkin brings with it a functional
      plab portal!
      41c54939
  15. 31 May, 2005 1 commit
  16. 21 May, 2004 1 commit
  17. 25 Mar, 2004 1 commit
    • Kirk Webb's avatar
      · b6da3a51
      Kirk Webb authored
      * Node.__copy() now uses rsync instead of weird 'dd' pipe
        - can do since sudo now works from square one after sliver instantiation
      
      * Made fixsudo and addgroup operations in emulabify() non-fatal
        - setup sometimes works even if they don't (esp. on dirty sliver)
      
      * option parser fixes
      
      * Shutup stupid warning messages from remote commands (tcgetattr, sudo lecture)
      b6da3a51
  18. 23 Mar, 2004 2 commits
    • Kirk Webb's avatar
      · 97a59692
      Kirk Webb authored
      * add "slicename" arg to Slice() constructor
        - so it can be specified manually
      
      * pushed up vnode setup wait time in plabmonitord to 16 minutes.
      97a59692
    • Kirk Webb's avatar
      Snapshot. · fd6d8cc9
      Kirk Webb authored
      * incompatible option handling and use removed from gen purpose libs
      * Global PLC mutex implemented, but currently disabled
      * plabmonitord parallelization cut in half (for now)
      
      I'm still very frustrated with option handling/passing.  Needs more thought,
      but the primary issue is that there really isn't a global variable space in
      python (global to file, yes, but not global to interpreter invocation).
      
      I've learned that __builtin__ might work for this, but it seems hacky..
      fd6d8cc9
  19. 17 Mar, 2004 2 commits
    • Kirk Webb's avatar
      More updates: · 3ae7da68
      Kirk Webb authored
      * Added comments
      * Added Emulab copyright
      * made mod_PLC handle the "not assigned" error case in freeNode()
        - optimization and less log clutter.
      * bug fix in plabmonitord (ISUP decection)
      3ae7da68
    • Kirk Webb's avatar
      Snapshot. · 856c2509
      Kirk Webb authored
      * Changed the way options are parsed in the python scripts so that modules
        can easily add and use their own options independent of top-level scripts.
      
      * Added --noIS and --pollNodes module options.
      
      * Added batch option to vnode_setup (degree of parallelization)
        - defaults to 10
      
      * Major updates to plamonitord
        - batches testing, currently to 40
      856c2509
  20. 25 Feb, 2004 1 commit
    • Kirk Webb's avatar
      Kirk takes the weed whacker to the plab code. This is the first pass result. · ae2eec76
      Kirk Webb authored
      I'll come along for a closer cut in the future.
      
      * Modularized the plab communications 'adaptor' interface and moved the
        dslice- and PLC-specific code into their own modules.
      
      * Wrote an API definition README
      
      * Separated out generic routines from libplab into their own library modules
        (libtestbed.py and libdb.py)
      
      Functionally, not much has changed - this was just a massive re-org with some
      other cleanup.  Should be much easier to code up new PLAB interfaces as the
      plab folks flail around in their attempt to standardize on something.
      
      XXX: may want to re-think where the generic library modules should go.  If
      more python code enters Elab, we'll probably want to move 'em to more standard
      locations.
      
      This isn't the end of the cleanup - I would eventually like to go back and
      rethink the class structures, beef up the comments, and extend the API.
      ae2eec76
  21. 30 Dec, 2003 1 commit
    • Kirk Webb's avatar
      Commit to usher in the new PLC regime. Added a config variable to · 6d205dc5
      Kirk Webb authored
      vnode_setup for the timeout on waiting for child processes.  I've
      set it to 10 minutes since all ancillary setup programs have their own
      time bounds (I think - the plab ones do anyway).
      
      The function of plabmonitord has changed slightly.  Instead of setting
      up and tearing down vnodes, its job is to just setup the emulab management
      sliver on plab nodes in hwdown.  Once the vserver comes up and reports isalive,
      it moves the node out of hwdown.  Currently, it first tries to tear down the
      vserver before reinstantiating it.  In the future, we could get fancier and
      try interacting with the service sliver directly before simply tearing it down.
      
      All new plab nodes now start life in hwdown, and must be summoned forth
      into production by plabmonitord.
      
      This commit does NOT include support for the node-local httpd.  That will
      come soon.
      6d205dc5
  22. 24 Oct, 2003 1 commit
  23. 23 Oct, 2003 1 commit
    • Kirk Webb's avatar
      · 5b52831c
      Kirk Webb authored
      Well, here it is:  The checkin implementing robust recovery/retry and
      asynchronous safe termination in plab allocation/deallocation/setup.
      
      Here are some of the more prominent changes/additions:
      
      * Bounded plab agent communication
        Scripts should never hang waiting for plab xmlrpc commands to complete;
        they have their own internal timeouts.  Node.create() in libplab is an
        exception, but is always run under a timeout constraint in vnode_setup
        and can be changed easily if the need arises.
      
      * Wrote functions in libplab to do the retry/recovery/timeout of remote
        command exection.
      
      * Wrapped critical sections with a signal watcher.
      
      * Added code to handle various error conditions properly
      
      * Added a libtestbed function, TBForkCmd, which runs a given program in
        a child process, and can optionally catch incoming SIGTERMs and terminate
        the child (then exit itself).
      
      * Fixed up vnode_setup to batch the 'plabnode free' operation along with
        a few other cleanups.  This should alleviate Jay's concern about how
        long it used to take to teardown a plab expt.
      
      * Whacked plabmonitord into better shape; fixed a couple bugs, taught it how
        to daemonize, and implemented a priority list for testing broken plab nodes.
        This list causes new (as yet unseen) nodes to be tried first over ones that
        have been tested already.
      5b52831c
  24. 30 Sep, 2003 1 commit
  25. 24 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Commit my daemon to monitor the status of plab physnodes in hwdown, · 59c5d5bb
      Leigh B. Stoller authored
      trying to bring them back from the dead periodically by trying to
      instantiate a vserver/vnode on them, and then tearing it down. If we
      can do that, then the node is usable, and it gets moved back into the
      normal holding experiment so that ptopgen will add it to ptop files.
      
      This deamon is not turned on yet; waiting for other little bits and
      pieces to be done.
      
      There is an equiv change in os_setup that moves physnodes into hwdown
      when a setup on a vnode fails.
      
      Lbs
      59c5d5bb