1. 25 Feb, 2004 1 commit
    • Kirk Webb's avatar
      Kirk takes the weed whacker to the plab code. This is the first pass result. · ae2eec76
      Kirk Webb authored
      I'll come along for a closer cut in the future.
      
      * Modularized the plab communications 'adaptor' interface and moved the
        dslice- and PLC-specific code into their own modules.
      
      * Wrote an API definition README
      
      * Separated out generic routines from libplab into their own library modules
        (libtestbed.py and libdb.py)
      
      Functionally, not much has changed - this was just a massive re-org with some
      other cleanup.  Should be much easier to code up new PLAB interfaces as the
      plab folks flail around in their attempt to standardize on something.
      
      XXX: may want to re-think where the generic library modules should go.  If
      more python code enters Elab, we'll probably want to move 'em to more standard
      locations.
      
      This isn't the end of the cleanup - I would eventually like to go back and
      rethink the class structures, beef up the comments, and extend the API.
      ae2eec76
  2. 30 Dec, 2003 1 commit
    • Kirk Webb's avatar
      Commit to usher in the new PLC regime. Added a config variable to · 6d205dc5
      Kirk Webb authored
      vnode_setup for the timeout on waiting for child processes.  I've
      set it to 10 minutes since all ancillary setup programs have their own
      time bounds (I think - the plab ones do anyway).
      
      The function of plabmonitord has changed slightly.  Instead of setting
      up and tearing down vnodes, its job is to just setup the emulab management
      sliver on plab nodes in hwdown.  Once the vserver comes up and reports isalive,
      it moves the node out of hwdown.  Currently, it first tries to tear down the
      vserver before reinstantiating it.  In the future, we could get fancier and
      try interacting with the service sliver directly before simply tearing it down.
      
      All new plab nodes now start life in hwdown, and must be summoned forth
      into production by plabmonitord.
      
      This commit does NOT include support for the node-local httpd.  That will
      come soon.
      6d205dc5
  3. 24 Oct, 2003 1 commit
  4. 23 Oct, 2003 1 commit
    • Kirk Webb's avatar
      · 5b52831c
      Kirk Webb authored
      Well, here it is:  The checkin implementing robust recovery/retry and
      asynchronous safe termination in plab allocation/deallocation/setup.
      
      Here are some of the more prominent changes/additions:
      
      * Bounded plab agent communication
        Scripts should never hang waiting for plab xmlrpc commands to complete;
        they have their own internal timeouts.  Node.create() in libplab is an
        exception, but is always run under a timeout constraint in vnode_setup
        and can be changed easily if the need arises.
      
      * Wrote functions in libplab to do the retry/recovery/timeout of remote
        command exection.
      
      * Wrapped critical sections with a signal watcher.
      
      * Added code to handle various error conditions properly
      
      * Added a libtestbed function, TBForkCmd, which runs a given program in
        a child process, and can optionally catch incoming SIGTERMs and terminate
        the child (then exit itself).
      
      * Fixed up vnode_setup to batch the 'plabnode free' operation along with
        a few other cleanups.  This should alleviate Jay's concern about how
        long it used to take to teardown a plab expt.
      
      * Whacked plabmonitord into better shape; fixed a couple bugs, taught it how
        to daemonize, and implemented a priority list for testing broken plab nodes.
        This list causes new (as yet unseen) nodes to be tried first over ones that
        have been tested already.
      5b52831c
  5. 30 Sep, 2003 1 commit
  6. 24 Sep, 2003 1 commit
    • Leigh Stoller's avatar
      Commit my daemon to monitor the status of plab physnodes in hwdown, · 59c5d5bb
      Leigh Stoller authored
      trying to bring them back from the dead periodically by trying to
      instantiate a vserver/vnode on them, and then tearing it down. If we
      can do that, then the node is usable, and it gets moved back into the
      normal holding experiment so that ptopgen will add it to ptop files.
      
      This deamon is not turned on yet; waiting for other little bits and
      pieces to be done.
      
      There is an equiv change in os_setup that moves physnodes into hwdown
      when a setup on a vnode fails.
      
      Lbs
      59c5d5bb