1. 08 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · 3a3c95fb
      Kirk Webb authored
      Parallelize the setup of plab vnodes alongside the loading of local
      physical nodes.  We fork vnode_setup to operate on the plab vnodes just
      before firing off local reload/reboot/reconfig operations.  The status
      of the plab vnode setup setup is checked just before firing off vnode_setup
      for any local vnodes.  The ISUP wait for plab vnodes continues to fall
      within the same stage as wating for local vnodes.  New arguments have been
      added to vnode_setup to tell it to only operate on specific vnode types.
      '-j' for local jail nodes, and '-p' for plab nodes.  If neither are
      specified, the default is to operate on all types.
      3a3c95fb
  2. 14 Jun, 2006 1 commit
    • Kirk Webb's avatar
      · 726cf6bd
      Kirk Webb authored
      Fix the last fix...
      726cf6bd
  3. 13 Jun, 2006 1 commit
    • Kirk Webb's avatar
      · ef07c4cc
      Kirk Webb authored
      Fix up how vnode_setup wait()s for it's children to deal with a change in how
      perl deals with interrupted system calls.  What the script was doing
      previsiously was iffy anyway.
      ef07c4cc
  4. 22 Jan, 2006 1 commit
    • Kirk Webb's avatar
      · c4d0c78a
      Kirk Webb authored
      libplab.py.in:
      
      Exit right away when signalled while trying to perform a remote command.
      
      vnode_setup.in:
      
      More info when a timeout occurs, and reduce the execution spacing a little.
      c4d0c78a
  5. 22 Dec, 2005 1 commit
    • Kirk Webb's avatar
      · 21f627fa
      Kirk Webb authored
      Add a bit of additional output info, and fix a little bug in rc.inplab
      21f627fa
  6. 20 May, 2004 1 commit
  7. 19 Mar, 2004 1 commit
  8. 18 Mar, 2004 1 commit
    • Kirk Webb's avatar
      More plab updates: · bbccd21a
      Kirk Webb authored
      * created "-w" vnode_setup option that specifies how long to wait (per-vnode)
        for setup to complete before giving up.
      * added sitevars for plab batch parallelism size and vnode setup timeout
      * modified os_setup to use above sitevars when invoking vnode_setup for an
        experiment containing plab vnodes.
      bbccd21a
  9. 17 Mar, 2004 1 commit
    • Kirk Webb's avatar
      Snapshot. · 856c2509
      Kirk Webb authored
      * Changed the way options are parsed in the python scripts so that modules
        can easily add and use their own options independent of top-level scripts.
      
      * Added --noIS and --pollNodes module options.
      
      * Added batch option to vnode_setup (degree of parallelization)
        - defaults to 10
      
      * Major updates to plamonitord
        - batches testing, currently to 40
      856c2509
  10. 11 Mar, 2004 1 commit
  11. 26 Feb, 2004 1 commit
  12. 04 Jan, 2004 1 commit
    • Kirk Webb's avatar
      Some plab PLC updates: · 8a669aa2
      Kirk Webb authored
      * use IP addr rather than finickey hostname when communicating with PLC.
      * make Node._create() aware of "already assigned" condition.
      * Bump vnode_setup timeout back to two minutes (for now).
      8a669aa2
  13. 31 Dec, 2003 1 commit
    • Kirk Webb's avatar
      Commit to usher in the new PLC regime. Added a config variable to · 6d205dc5
      Kirk Webb authored
      vnode_setup for the timeout on waiting for child processes.  I've
      set it to 10 minutes since all ancillary setup programs have their own
      time bounds (I think - the plab ones do anyway).
      
      The function of plabmonitord has changed slightly.  Instead of setting
      up and tearing down vnodes, its job is to just setup the emulab management
      sliver on plab nodes in hwdown.  Once the vserver comes up and reports isalive,
      it moves the node out of hwdown.  Currently, it first tries to tear down the
      vserver before reinstantiating it.  In the future, we could get fancier and
      try interacting with the service sliver directly before simply tearing it down.
      
      All new plab nodes now start life in hwdown, and must be summoned forth
      into production by plabmonitord.
      
      This commit does NOT include support for the node-local httpd.  That will
      come soon.
      6d205dc5
  14. 12 Dec, 2003 1 commit
  15. 18 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  16. 23 Oct, 2003 1 commit
    • Kirk Webb's avatar
      · 5b52831c
      Kirk Webb authored
      Well, here it is:  The checkin implementing robust recovery/retry and
      asynchronous safe termination in plab allocation/deallocation/setup.
      
      Here are some of the more prominent changes/additions:
      
      * Bounded plab agent communication
        Scripts should never hang waiting for plab xmlrpc commands to complete;
        they have their own internal timeouts.  Node.create() in libplab is an
        exception, but is always run under a timeout constraint in vnode_setup
        and can be changed easily if the need arises.
      
      * Wrote functions in libplab to do the retry/recovery/timeout of remote
        command exection.
      
      * Wrapped critical sections with a signal watcher.
      
      * Added code to handle various error conditions properly
      
      * Added a libtestbed function, TBForkCmd, which runs a given program in
        a child process, and can optionally catch incoming SIGTERMs and terminate
        the child (then exit itself).
      
      * Fixed up vnode_setup to batch the 'plabnode free' operation along with
        a few other cleanups.  This should alleviate Jay's concern about how
        long it used to take to teardown a plab expt.
      
      * Whacked plabmonitord into better shape; fixed a couple bugs, taught it how
        to daemonize, and implemented a priority list for testing broken plab nodes.
        This list causes new (as yet unseen) nodes to be tried first over ones that
        have been tested already.
      5b52831c
  17. 30 Sep, 2003 1 commit
  18. 24 Sep, 2003 4 commits
    • Robert Ricci's avatar
      Fix a bug in finding the oldest child for timeout purposes, we were · 88a8f388
      Robert Ricci authored
      actually finding the youngest.
      
      Luckily, it was not causing timeouts that were too short, only
      timeouts that were too long.
      88a8f388
    • Kirk Webb's avatar
      Reverting vnode_setup to revision 1.28. All the forking going on here is · 2c96581a
      Kirk Webb authored
      causing problems.  Will investigate tomorrow.
      2c96581a
    • Kirk Webb's avatar
      Quick change to not terminate vnode_setup when pladnode() function · 34216cbe
      Kirk Webb authored
      finds that the pid returned from wait() doesn't match the one returned
      from fork() earlier - this shouldn't happen, but it is.  I am checking for
      errors - parhaps I'm missing something though.  This affects plabnode free
      in vnode_setup since it vnode_setup doesn't fork when it runs this.
      34216cbe
    • Kirk Webb's avatar
      · c0d7f4ea
      Kirk Webb authored
      Updated vnode_setup to fork+exec plabnode (alloc|free) rather than invoking
      it with system().  Now when the parent receives a SIGTERM from its parent
      (the top-level vnode_setup), it will kill off it's plabnode child process before
      exiting itself.  invocation of plabnode is now done via the plabnode()
      function.  Needs some commenting.
      
      Tested thoroughly.
      c0d7f4ea
  19. 22 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Minor allocstate changes to try and cope with plab nodes that either · 6d4187a5
      Leigh B. Stoller authored
      fail in "plabnode alloc" or in the remote vnodesetup call. In the
      former case, we do not want to "plabnode free" it later. In the later,
      we want to plabnode free it right away, and make sure we do not try to
      remote vnode teardown or plabfree it later. In either case, os_setup
      needs to check so that it does not bother waiting for the node since
      it is wasted time. I use an alternate dead state for this, but the
      real solution is to move much of the vnode specific code from os_setup
      to vnode_setup.
      
      Note that this stuff is mostly untested since I need nodes to fail!
      The normal path works fine though.
      6d4187a5
  20. 18 Sep, 2003 1 commit
  21. 17 Sep, 2003 2 commits
  22. 16 Sep, 2003 1 commit
  23. 15 Sep, 2003 1 commit
  24. 13 Sep, 2003 1 commit
  25. 27 Aug, 2003 1 commit
  26. 25 Aug, 2003 1 commit
  27. 22 Aug, 2003 1 commit
  28. 25 Jul, 2003 1 commit
  29. 22 Jul, 2003 1 commit
  30. 18 Jul, 2003 3 commits
  31. 17 Jul, 2003 1 commit
  32. 15 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      A set of changes to make swapmod work on jailed nodes (note, swapmod · 92ff875a
      Leigh B. Stoller authored
      does not yet work with remove virtual nodes; that will take even more
      work).
      
      Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer
      deallocates unused nodes, but rather moves them into the new state for
      the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to
      be torn down, since its possible that the node on which they were
      living will not be deallocated (say, if there are other vnodes on
      it). We do not want to be doing that from assign_wrapper, so tbswap
      looks for those nodes.
      
      Made vnode_setup allocstate aware in the same way that os_setup is;
      do not reboot vnodes or try to set up vnodes when they are already in
      the RES_READY state, as they will be when doing a swapmod. In
      addition, if os_setup is going to reboot the underlying physnode, move
      the vnodes on that node into RES_READY too, since there they will
      setup automatically. Might need an interim state here, for correctness.
      92ff875a
  33. 16 Apr, 2003 1 commit
  34. 13 Jan, 2003 1 commit