1. 16 Oct, 2003 4 commits
    • Leigh B. Stoller's avatar
      Brave new world of tmcc client side caching. The goal is to reduce the · 2b72f2c9
      Leigh B. Stoller authored
      number of connections to tmcd, and the resulting number of DB queries.
      Currently thats about 24 per node when it boots. Each vnode adds
      another 24 or so. The new approach is to use the "fullconfig" command,
      which dumps the entire config in one shot, saving about 20 of those
      connections. We still need to do the status/state commands for real of
      course. When a node boots, it requests the fullconfig; the client side
      takes this fullconfig, and dumps the individual sections to
      /var/emulab/boot/tmcc/section_name. Subsequent requests first look for
      it locally in the above named files, falling back to real tmcc if none
      exists. The update command also refreshes the cache.
      
      Tested for jails and plab node vservers as well.
      2b72f2c9
    • Leigh B. Stoller's avatar
      Back out Rob's last change to mkgroup and move to mkproj since required · 82d87bdb
      Leigh B. Stoller authored
      ordering of events is not obvious to anyone except me (on a good day).
      82d87bdb
    • Leigh B. Stoller's avatar
      Fix bug with respect to modified experiments that abort and get · 589e97d2
      Leigh B. Stoller authored
      swapped out (non-recoverable) by tbswap. swapexp was leaving the
      experiment in the running state instead of paused. We need to check
      this after tbswap since we do not get reasonable error codes back.
      Also some cleanup with respect to how aborted modifies are handled.
      I think I understand what Chad did ...
      
      A general comment; we need to be better about returning meaningful
      error codes!
      589e97d2
    • Leigh B. Stoller's avatar
      Address some Mike nits. · a2daf08f
      Leigh B. Stoller authored
      a2daf08f
  2. 15 Oct, 2003 13 commits
  3. 14 Oct, 2003 6 commits
  4. 13 Oct, 2003 10 commits
  5. 10 Oct, 2003 7 commits
    • Mac Newbold's avatar
      Fix a nit for Mike. · b71f5f90
      Mac Newbold authored
      b71f5f90
    • Mac Newbold's avatar
      8890a9cb
    • Robert Ricci's avatar
      99c2a1ab
    • Leigh B. Stoller's avatar
    • Mike Hibler's avatar
      Make sure it finds the renamed tftp-hpa · 4daef377
      Mike Hibler authored
      4daef377
    • Mac Newbold's avatar
      Add statewait changes · e8b47f26
      Mac Newbold authored
      e8b47f26
    • Mac Newbold's avatar
      New StateWait changes - the main point of all this is to move to our new · 2b2a306d
      Mac Newbold authored
      model of waiting for state changes. Before we were watching the database
      (which means we can only watch for terminal/stable/long-lived states, and
      have to poll the db). Now things that are waiting for states to change
      become event listeners, and watch the stream of events flow by, and don't
      have to do any polling. They can now watch for any state, and even
      sequences of states (ie a Shutdown followed by an Isup).
      
      To do this, there is now a cool StateWait.pm library that encapsulates the
      functionality needed. To use it, you call initStateWait before you start
      the chain of events (ie before you call node reboot). Then do your stuff,
      and call waitForState() when you're ready to wait. It can be told to
      return periodically with the results so far, and you can cancel waiting
      for things. An example program called waitForState is in
      testbed/event/stated/ , and can also be used nicely as a command line tool
      that wraps up the library functionality.
      
      This also required the introduction of a TBFAILED event that can be sent
      when a node isn't going to make it to the state that someone may be
      waiting for. Ie if it gets wedged coming up, and stated retries, but
      eventually gives up on it, it sends this to let things know that the node
      is hozed and won't ever come up.
      
      Another thing that is part of this is that node_reboot moves (back) to the
      fully-event-driven model, where users call node reboot, and it does some
      checks and sends some events. Then stated calls node_reboot in "real mode"
      to actually do the work, and handles doing the appropriate retries until
      the node either comes up or is deemed "failed" and stated gives up on it.
      This means stated is also the gatekeeper of when you can and cannot reboot
      a node. (See mail archives for extensive discussions of the details.)
      
      A big part of the motivation for this was to get uninformed timeouts and
      retries out of os_load/os_setup and put them in stated where we can make a
      wiser choice. So os_load and os_setup now use this new stuff and don't
      have to worry about timing out on nodes and rebooting. Stated makes sure
      that they either come up, get retried, or fail to boot. tbrestart also
      underwent a similar change.
      2b2a306d