1. 03 Oct, 2012 11 commits
  2. 01 Oct, 2012 2 commits
  3. 28 Sep, 2012 13 commits
  4. 27 Sep, 2012 3 commits
  5. 26 Sep, 2012 9 commits
  6. 25 Sep, 2012 2 commits
    • Mike Hibler's avatar
      Logic for making osload failures non-fatal when nonfatal failure mode is set. · 783d3caf
      Mike Hibler authored
      Previously tb-set-node-failure-mode of "nonfatal" only applied to failures
      when rebooting a node. If there was an error during the disk reload phase,
      the experiment would still fail.
      
      This makes sense, as it is pretty dicey to let a node boot with an unloaded
      or partially-loaded disk. But there are situations, such as 500+ node
      experiments on PRObE, where it makes sense to not fail the experiment.
      
      What we do if a node fails reload, is to clear the OSIDs and partition info
      for the node and then force it to reboot (by setting the state to TBFAILED,
      for which there is a REBOOT trigger in stated). This causes the node to come
      up and park in pxeboot in the PXEWAIT state. It should remain in this state
      across reboots. The user can manually os_load the machine, or do a swap
      modify which will force the node to try to reload the original OS.
      
      Since this may not be for everyone, this new allow non-fatal osload failures
      requires that the "OsloadFailNonfatal" feature be enabled. This allows the
      new behavior to be global, per-group, per-experiment or per-user. The default
      is disabled.
      783d3caf
    • Mike Hibler's avatar
      More fixed to "wedged node" handling. · e5d8d3cf
      Mike Hibler authored
      e5d8d3cf