1. 29 Aug, 2007 2 commits
  2. 16 Aug, 2007 1 commit
  3. 18 Dec, 2006 1 commit
    • David Johnson's avatar
      Bugfix for plab nodes. This problem was tripped by Kevin. What was · e53e402f
      David Johnson authored
      happening was that when Kevin swapmod'd to get rid of failed nodes,
      he just took the bad ones out.  This forced a change in the
      vname<->vnode mapping, and the failed node got put in a state
      (RES_INIT_CLEAN) that vnode_setup couldn't handle for plab nodes.
      Basically, the problem is that vnode_setup was assuming that the
      RES_INIT_CLEAN meant that the plab vnode needed to be allocated --
      but it was already allocated in the previous swap.
  4. 11 Oct, 2006 1 commit
    • Kirk Webb's avatar
      · 471d1d26
      Kirk Webb authored
      Change the way vnode_setup handles plab nodes a bit to avoid a couple of
      buggy situations.
        * Don't try vnodesetup -h on plab nodes
      This can hang, or even fail.  Since nothing useful is conveyed by this
      step, just skip it, set the node's state to SHUTDOWN, and ask pl_conf
      on the node to remove the vserver.
        * Set plab node's alloc state to TBDB_ALLOCSTATE_RES_INIT_DIRTY after
      This avoids a bug where Emulab cluster nodes fail to come up, and so
      os_setup never waits on the plab vnodes (now that they are started in
      parallel with physical node setup).  Previously their alloc state made
      them look clean, and so the vservers would not be reaped during
  5. 08 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · 3a3c95fb
      Kirk Webb authored
      Parallelize the setup of plab vnodes alongside the loading of local
      physical nodes.  We fork vnode_setup to operate on the plab vnodes just
      before firing off local reload/reboot/reconfig operations.  The status
      of the plab vnode setup setup is checked just before firing off vnode_setup
      for any local vnodes.  The ISUP wait for plab vnodes continues to fall
      within the same stage as wating for local vnodes.  New arguments have been
      added to vnode_setup to tell it to only operate on specific vnode types.
      '-j' for local jail nodes, and '-p' for plab nodes.  If neither are
      specified, the default is to operate on all types.
  6. 13 Jun, 2006 2 commits
    • Kirk Webb's avatar
      · 726cf6bd
      Kirk Webb authored
      Fix the last fix...
    • Kirk Webb's avatar
      · ef07c4cc
      Kirk Webb authored
      Fix up how vnode_setup wait()s for it's children to deal with a change in how
      perl deals with interrupted system calls.  What the script was doing
      previsiously was iffy anyway.
  7. 22 Jan, 2006 1 commit
    • Kirk Webb's avatar
      · c4d0c78a
      Kirk Webb authored
      Exit right away when signalled while trying to perform a remote command.
      More info when a timeout occurs, and reduce the execution spacing a little.
  8. 21 Dec, 2005 1 commit
    • Kirk Webb's avatar
      · 21f627fa
      Kirk Webb authored
      Add a bit of additional output info, and fix a little bug in rc.inplab
  9. 20 May, 2004 1 commit
  10. 19 Mar, 2004 1 commit
  11. 17 Mar, 2004 2 commits
    • Kirk Webb's avatar
      More plab updates: · bbccd21a
      Kirk Webb authored
      * created "-w" vnode_setup option that specifies how long to wait (per-vnode)
        for setup to complete before giving up.
      * added sitevars for plab batch parallelism size and vnode setup timeout
      * modified os_setup to use above sitevars when invoking vnode_setup for an
        experiment containing plab vnodes.
    • Kirk Webb's avatar
      Snapshot. · 856c2509
      Kirk Webb authored
      * Changed the way options are parsed in the python scripts so that modules
        can easily add and use their own options independent of top-level scripts.
      * Added --noIS and --pollNodes module options.
      * Added batch option to vnode_setup (degree of parallelization)
        - defaults to 10
      * Major updates to plamonitord
        - batches testing, currently to 40
  12. 11 Mar, 2004 1 commit
  13. 26 Feb, 2004 1 commit
  14. 03 Jan, 2004 1 commit
    • Kirk Webb's avatar
      Some plab PLC updates: · 8a669aa2
      Kirk Webb authored
      * use IP addr rather than finickey hostname when communicating with PLC.
      * make Node._create() aware of "already assigned" condition.
      * Bump vnode_setup timeout back to two minutes (for now).
  15. 30 Dec, 2003 1 commit
    • Kirk Webb's avatar
      Commit to usher in the new PLC regime. Added a config variable to · 6d205dc5
      Kirk Webb authored
      vnode_setup for the timeout on waiting for child processes.  I've
      set it to 10 minutes since all ancillary setup programs have their own
      time bounds (I think - the plab ones do anyway).
      The function of plabmonitord has changed slightly.  Instead of setting
      up and tearing down vnodes, its job is to just setup the emulab management
      sliver on plab nodes in hwdown.  Once the vserver comes up and reports isalive,
      it moves the node out of hwdown.  Currently, it first tries to tear down the
      vserver before reinstantiating it.  In the future, we could get fancier and
      try interacting with the service sliver directly before simply tearing it down.
      All new plab nodes now start life in hwdown, and must be summoned forth
      into production by plabmonitord.
      This commit does NOT include support for the node-local httpd.  That will
      come soon.
  16. 12 Dec, 2003 1 commit
  17. 17 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      Things to note:
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
  18. 23 Oct, 2003 1 commit
    • Kirk Webb's avatar
      · 5b52831c
      Kirk Webb authored
      Well, here it is:  The checkin implementing robust recovery/retry and
      asynchronous safe termination in plab allocation/deallocation/setup.
      Here are some of the more prominent changes/additions:
      * Bounded plab agent communication
        Scripts should never hang waiting for plab xmlrpc commands to complete;
        they have their own internal timeouts.  Node.create() in libplab is an
        exception, but is always run under a timeout constraint in vnode_setup
        and can be changed easily if the need arises.
      * Wrote functions in libplab to do the retry/recovery/timeout of remote
        command exection.
      * Wrapped critical sections with a signal watcher.
      * Added code to handle various error conditions properly
      * Added a libtestbed function, TBForkCmd, which runs a given program in
        a child process, and can optionally catch incoming SIGTERMs and terminate
        the child (then exit itself).
      * Fixed up vnode_setup to batch the 'plabnode free' operation along with
        a few other cleanups.  This should alleviate Jay's concern about how
        long it used to take to teardown a plab expt.
      * Whacked plabmonitord into better shape; fixed a couple bugs, taught it how
        to daemonize, and implemented a priority list for testing broken plab nodes.
        This list causes new (as yet unseen) nodes to be tried first over ones that
        have been tested already.
  19. 30 Sep, 2003 1 commit
  20. 24 Sep, 2003 1 commit
  21. 23 Sep, 2003 3 commits
    • Kirk Webb's avatar
      Reverting vnode_setup to revision 1.28. All the forking going on here is · 2c96581a
      Kirk Webb authored
      causing problems.  Will investigate tomorrow.
    • Kirk Webb's avatar
      Quick change to not terminate vnode_setup when pladnode() function · 34216cbe
      Kirk Webb authored
      finds that the pid returned from wait() doesn't match the one returned
      from fork() earlier - this shouldn't happen, but it is.  I am checking for
      errors - parhaps I'm missing something though.  This affects plabnode free
      in vnode_setup since it vnode_setup doesn't fork when it runs this.
    • Kirk Webb's avatar
      · c0d7f4ea
      Kirk Webb authored
      Updated vnode_setup to fork+exec plabnode (alloc|free) rather than invoking
      it with system().  Now when the parent receives a SIGTERM from its parent
      (the top-level vnode_setup), it will kill off it's plabnode child process before
      exiting itself.  invocation of plabnode is now done via the plabnode()
      function.  Needs some commenting.
      Tested thoroughly.
  22. 22 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Minor allocstate changes to try and cope with plab nodes that either · 6d4187a5
      Leigh B. Stoller authored
      fail in "plabnode alloc" or in the remote vnodesetup call. In the
      former case, we do not want to "plabnode free" it later. In the later,
      we want to plabnode free it right away, and make sure we do not try to
      remote vnode teardown or plabfree it later. In either case, os_setup
      needs to check so that it does not bother waiting for the node since
      it is wasted time. I use an alternate dead state for this, but the
      real solution is to move much of the vnode specific code from os_setup
      to vnode_setup.
      Note that this stuff is mostly untested since I need nodes to fail!
      The normal path works fine though.
  23. 18 Sep, 2003 1 commit
  24. 17 Sep, 2003 2 commits
  25. 16 Sep, 2003 1 commit
  26. 15 Sep, 2003 1 commit
  27. 12 Sep, 2003 1 commit
  28. 27 Aug, 2003 1 commit
  29. 25 Aug, 2003 1 commit
  30. 22 Aug, 2003 1 commit
  31. 25 Jul, 2003 1 commit
  32. 22 Jul, 2003 1 commit
  33. 18 Jul, 2003 2 commits