1. 17 Mar, 2004 1 commit
    • Kirk Webb's avatar
      Snapshot. · 856c2509
      Kirk Webb authored
      * Changed the way options are parsed in the python scripts so that modules
        can easily add and use their own options independent of top-level scripts.
      
      * Added --noIS and --pollNodes module options.
      
      * Added batch option to vnode_setup (degree of parallelization)
        - defaults to 10
      
      * Major updates to plamonitord
        - batches testing, currently to 40
      856c2509
  2. 11 Mar, 2004 1 commit
  3. 26 Feb, 2004 1 commit
  4. 03 Jan, 2004 1 commit
    • Kirk Webb's avatar
      Some plab PLC updates: · 8a669aa2
      Kirk Webb authored
      * use IP addr rather than finickey hostname when communicating with PLC.
      * make Node._create() aware of "already assigned" condition.
      * Bump vnode_setup timeout back to two minutes (for now).
      8a669aa2
  5. 30 Dec, 2003 1 commit
    • Kirk Webb's avatar
      Commit to usher in the new PLC regime. Added a config variable to · 6d205dc5
      Kirk Webb authored
      vnode_setup for the timeout on waiting for child processes.  I've
      set it to 10 minutes since all ancillary setup programs have their own
      time bounds (I think - the plab ones do anyway).
      
      The function of plabmonitord has changed slightly.  Instead of setting
      up and tearing down vnodes, its job is to just setup the emulab management
      sliver on plab nodes in hwdown.  Once the vserver comes up and reports isalive,
      it moves the node out of hwdown.  Currently, it first tries to tear down the
      vserver before reinstantiating it.  In the future, we could get fancier and
      try interacting with the service sliver directly before simply tearing it down.
      
      All new plab nodes now start life in hwdown, and must be summoned forth
      into production by plabmonitord.
      
      This commit does NOT include support for the node-local httpd.  That will
      come soon.
      6d205dc5
  6. 12 Dec, 2003 1 commit
  7. 17 Nov, 2003 1 commit
    • Leigh Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  8. 23 Oct, 2003 1 commit
    • Kirk Webb's avatar
      · 5b52831c
      Kirk Webb authored
      Well, here it is:  The checkin implementing robust recovery/retry and
      asynchronous safe termination in plab allocation/deallocation/setup.
      
      Here are some of the more prominent changes/additions:
      
      * Bounded plab agent communication
        Scripts should never hang waiting for plab xmlrpc commands to complete;
        they have their own internal timeouts.  Node.create() in libplab is an
        exception, but is always run under a timeout constraint in vnode_setup
        and can be changed easily if the need arises.
      
      * Wrote functions in libplab to do the retry/recovery/timeout of remote
        command exection.
      
      * Wrapped critical sections with a signal watcher.
      
      * Added code to handle various error conditions properly
      
      * Added a libtestbed function, TBForkCmd, which runs a given program in
        a child process, and can optionally catch incoming SIGTERMs and terminate
        the child (then exit itself).
      
      * Fixed up vnode_setup to batch the 'plabnode free' operation along with
        a few other cleanups.  This should alleviate Jay's concern about how
        long it used to take to teardown a plab expt.
      
      * Whacked plabmonitord into better shape; fixed a couple bugs, taught it how
        to daemonize, and implemented a priority list for testing broken plab nodes.
        This list causes new (as yet unseen) nodes to be tried first over ones that
        have been tested already.
      5b52831c
  9. 30 Sep, 2003 1 commit
  10. 24 Sep, 2003 1 commit
  11. 23 Sep, 2003 3 commits
    • Kirk Webb's avatar
      Reverting vnode_setup to revision 1.28. All the forking going on here is · 2c96581a
      Kirk Webb authored
      causing problems.  Will investigate tomorrow.
      2c96581a
    • Kirk Webb's avatar
      Quick change to not terminate vnode_setup when pladnode() function · 34216cbe
      Kirk Webb authored
      finds that the pid returned from wait() doesn't match the one returned
      from fork() earlier - this shouldn't happen, but it is.  I am checking for
      errors - parhaps I'm missing something though.  This affects plabnode free
      in vnode_setup since it vnode_setup doesn't fork when it runs this.
      34216cbe
    • Kirk Webb's avatar
      · c0d7f4ea
      Kirk Webb authored
      Updated vnode_setup to fork+exec plabnode (alloc|free) rather than invoking
      it with system().  Now when the parent receives a SIGTERM from its parent
      (the top-level vnode_setup), it will kill off it's plabnode child process before
      exiting itself.  invocation of plabnode is now done via the plabnode()
      function.  Needs some commenting.
      
      Tested thoroughly.
      c0d7f4ea
  12. 22 Sep, 2003 1 commit
    • Leigh Stoller's avatar
      Minor allocstate changes to try and cope with plab nodes that either · 6d4187a5
      Leigh Stoller authored
      fail in "plabnode alloc" or in the remote vnodesetup call. In the
      former case, we do not want to "plabnode free" it later. In the later,
      we want to plabnode free it right away, and make sure we do not try to
      remote vnode teardown or plabfree it later. In either case, os_setup
      needs to check so that it does not bother waiting for the node since
      it is wasted time. I use an alternate dead state for this, but the
      real solution is to move much of the vnode specific code from os_setup
      to vnode_setup.
      
      Note that this stuff is mostly untested since I need nodes to fail!
      The normal path works fine though.
      6d4187a5
  13. 18 Sep, 2003 1 commit
  14. 17 Sep, 2003 2 commits
  15. 16 Sep, 2003 1 commit
  16. 15 Sep, 2003 1 commit
  17. 12 Sep, 2003 1 commit
  18. 27 Aug, 2003 1 commit
  19. 25 Aug, 2003 1 commit
  20. 22 Aug, 2003 1 commit
  21. 25 Jul, 2003 1 commit
  22. 22 Jul, 2003 1 commit
  23. 18 Jul, 2003 2 commits
  24. 17 Jul, 2003 2 commits
  25. 15 Jul, 2003 1 commit
    • Leigh Stoller's avatar
      A set of changes to make swapmod work on jailed nodes (note, swapmod · 92ff875a
      Leigh Stoller authored
      does not yet work with remove virtual nodes; that will take even more
      work).
      
      Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer
      deallocates unused nodes, but rather moves them into the new state for
      the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to
      be torn down, since its possible that the node on which they were
      living will not be deallocated (say, if there are other vnodes on
      it). We do not want to be doing that from assign_wrapper, so tbswap
      looks for those nodes.
      
      Made vnode_setup allocstate aware in the same way that os_setup is;
      do not reboot vnodes or try to set up vnodes when they are already in
      the RES_READY state, as they will be when doing a swapmod. In
      addition, if os_setup is going to reboot the underlying physnode, move
      the vnodes on that node into RES_READY too, since there they will
      setup automatically. Might need an interim state here, for correctness.
      92ff875a
  26. 16 Apr, 2003 1 commit
  27. 13 Jan, 2003 1 commit
  28. 07 Jan, 2003 1 commit
  29. 31 Dec, 2002 1 commit
    • Leigh Stoller's avatar
      Clean up permission check. · c832fa47
      Leigh Stoller authored
      Remove the sanity check of the experiment state.
      Add check for a local node and do not setup/teardown since the reboot
      will take care of that (jailed nodes setup at boot time, and obviously
      they are going to get torn down when the node goes down!).
      c832fa47
  30. 18 Dec, 2002 1 commit
  31. 10 Sep, 2002 1 commit
  32. 07 Jul, 2002 1 commit
  33. 05 Jun, 2002 1 commit
    • Leigh Stoller's avatar
      Changes to sshtb. Remove sshremote, and convert sshtb into a perl · 231fc2b1
      Leigh Stoller authored
      script that checks the database to see if local or remote. The problem
      with this is that the ssh syntax makes it hard to determine the host
      name by inspection. Would need to parse all the ssh args (bad idea),
      ot work backwards and try to figure out the difference between the
      command (which is not a string but a sequence of args) and the host
      and the preceeding ssh args. Hell with that! Changed sshtb to require
      a specific -host argument. Read the args and look for it. Error out of
      not found, to catch improper usage.
      
      The moral of this update: "sshtb [ssh args] -host <host> [more args ...]
      231fc2b1
  34. 29 May, 2002 1 commit
  35. 24 May, 2002 1 commit