1. 17 May, 2010 3 commits
  2. 14 Apr, 2010 1 commit
    • Mike Hibler's avatar
      Changes for speeding up elabinelab server setup. · 6feda7d3
      Mike Hibler authored
      Boss/ops/fs: reboot them together after setup rather than serially.
      
      Nodes: leave them in PXEWAIT throughout the setup, until after boss has
      been rebooted.  At that point we send them the new bootinfo RESTART command
      telling pxeboot to re-DHCP and use the new info obtained (next-server) to
      contact a potentially new boss node.  This is a quick way to switch a node
      in PXEWAIT from talking to the outer boss to talking to the inner one.
      
      A significant number of rinky-dink changes were needed to do this, primarily
      adding a new state, PXELIMBO, where nodes can be sent to sit until they are
      restarted.  It turns out, just putting them in an existing state such as
      PXEWAKEUP or SHUTDOWN wouldn't work, as they tend to timeout or otherwise
      reboot.
      6feda7d3
  3. 23 Feb, 2010 1 commit
  4. 11 Jan, 2010 1 commit
  5. 07 Jan, 2010 1 commit
  6. 06 Jan, 2010 1 commit
  7. 28 Dec, 2009 1 commit
  8. 22 Dec, 2009 3 commits
  9. 21 Dec, 2009 1 commit
    • Leigh Stoller's avatar
      New approach to dealing with nodes that fail to boot is os_setup, and · 5cf6aad2
      Leigh Stoller authored
      land in hwdown.
      
      Currently, if a node fails to boot in os_setup and the node is running
      a system image, it is moved into hwdown. 99% of the time this is
      wasted work; the node did not fail for hardware reasons, but for some
      other reason that is transient.
      
      The new approach is to move the node into another holding experiment,
      emulab-ops/hwcheckup. The daemon watches that experiment, and nodes
      that land in it are freshly reloaded with the default image and
      rebooted. If the node reboots okay after reload, it is released back
      into the free pool. If it fails any part of the reload/reboot, it is
      officially moved into hwdown.
      
      Another possible use; if you have a suspect node, you go wiggle some
      hardware, and instead of releasing it into the free pool, you move it
      into hwcheckup, to see if it reloads/reboots. If not, it lands in
      hwdown again. Then you break out the hammer.
      
      Most of the changes in Node.pm, libdb.pm, and os_setup are
      organizational changes to make the code cleaner.
      5cf6aad2