1. 04 Apr, 2012 1 commit
  2. 02 Apr, 2012 1 commit
  3. 14 Mar, 2012 1 commit
    • Mike Hibler's avatar
      Make the secure boot path work with PXEWAIT. · ceeede28
      Mike Hibler authored
      When a node with the secure boot dongle is freed, it goes into PXEWAIT in
      the context of the secure MFS. Previously we remained in "secure mode"
      (i.e., did not terminate with a TPMSIGNOFF) while a node was in this state.
      If the next use of the node, just booted from the OS that was already on
      the disk, then we never signed off properly.
      
      Now we sign off before entering PXEWAIT. I thought that this would be the
      easiest alternative to fixing the problem..HaHaHa..not! Because now we have
      to restart the secure boot path (i.e., reboot) if the result of coming out
      of PXEWAIT is a request to reload the disk (i.e., if we are continuing the
      secure disk load path).
      
      Ideally this would have required only modifications to the state machines
      for SECUREBOOT/LOAD, but as you can see by the presence of stated.in in the
      modified files, this was not the case. The change required some additional
      "finesse" to get it working. See the comments in stated.in and bootinfo_mysql.c
      if you really care.
      ceeede28
  4. 19 Jan, 2012 1 commit
  5. 10 Nov, 2011 1 commit
  6. 30 Aug, 2011 1 commit
  7. 23 Aug, 2011 1 commit
  8. 17 Aug, 2011 1 commit
  9. 11 Aug, 2011 1 commit
    • Mike Hibler's avatar
      Initial support for loading Windows7 .wim images via WinPE/ImageX. · ac711ea5
      Mike Hibler authored
      1. Support for "one-shot" PXE booting ala the one-shot osid. Switches to
         pxelinux to boot WinPE and then switch back after done. Painful now
         because we have to HUP dhcpd everytime we change the PXE path, but we
         may be able to fix this in the future by going all-pxelinux-all-the-time.
      
      2. Added pxe_select, analogous to os_select, for changing the pxe_boot_path
         including the one time path.
      
      3. Added the WIMRELOAD state machine to shepherd a node through the process.
         Still has some rough edges and may need refining.
      ac711ea5
  10. 28 Jul, 2011 1 commit
  11. 13 Jul, 2011 1 commit
  12. 27 Jun, 2011 1 commit
  13. 22 Jun, 2011 1 commit
    • Mike Hibler's avatar
      When forcing a transition to a new opMode, look for a valid next state. · 2b3fd82a
      Mike Hibler authored
      Previously, a forced opModeTransition would just remain in the same state
      after moving to the new op_mode rather than looking for a valid
      oldmode/oldstate => newmode/newstate transition in the mode_transitions
      table. This should only effect the transition from SECUREBOOT/TPMSIGNOFF,
      since all other uses should not find a valid newstate and should remain in
      the old state as before.
      2b3fd82a
  14. 20 Jun, 2011 1 commit
    • Mike Hibler's avatar
      If stated were a space probe it would have crashed into Mars... · c88f89f3
      Mike Hibler authored
      Minor units conversion problem here. IO::Poll() takes seconds as it argument,
      not milliseconds as we were doing (by multiplying the arg by 1000 before
      calling).
      
      Unfortunately, this is not the Big One (memory corruption) that we have been
      chasing for so long. Sigh...
      (cherry picked from commit 1ee85494)
      c88f89f3
  15. 13 Jun, 2011 1 commit
    • Mike Hibler's avatar
      If stated were a space probe it would have crashed into Mars... · 1ee85494
      Mike Hibler authored
      Minor units conversion problem here. IO::Poll() takes seconds as it argument,
      not milliseconds as we were doing (by multiplying the arg by 1000 before
      calling).
      
      Unfortunately, this is not the Big One (memory corruption) that we have been
      chasing for so long. Sigh...
      1ee85494
  16. 02 Jun, 2011 1 commit
  17. 11 May, 2011 1 commit
  18. 10 May, 2011 1 commit
    • Leigh Stoller's avatar
      Gack, must call "select STDOUT" after the reopen operation, since we · 84a6e9fe
      Leigh Stoller authored
      used "select STDERR" to change the line buffering. The result was that
      after the log roll, the child was printing to STDERR instead of
      STDOUT, and so the parent never saw any new events.
      
      Note that USR1 (re-exec binary) does not work since exec bypasses the
      END block, and things get messed up. Not fixed yet.
      84a6e9fe
  19. 13 Mar, 2011 1 commit
  20. 25 Feb, 2011 1 commit
    • Mike Hibler's avatar
      Fix some nagging bugs. · 85d8986c
      Mike Hibler authored
      We were not processing the timeout queue because we got stuck forever in
      the loop that processed events. Now before looping back to sysread, make
      sure there is something to read so we don't block.
      
      When we startup or re-read the DB state, ignore really old state timeout
      values; e.g., for nodes that have been dead for ages but happen to be in
      a state such as SHUTDOWN that has a timeout.
      
      In the main loop, handle any re-read of the DB state before testing the
      queue length to see if we can do a blocking poll. Re-reading the state may
      add timeouts to the queue.
      85d8986c
  21. 24 Feb, 2011 1 commit
  22. 04 Feb, 2011 1 commit
  23. 02 Feb, 2011 2 commits
  24. 01 Feb, 2011 1 commit
  25. 25 Jan, 2011 1 commit
  26. 24 Jan, 2011 1 commit
  27. 17 Nov, 2010 2 commits
  28. 12 Nov, 2010 1 commit
  29. 10 Nov, 2010 1 commit
  30. 09 Nov, 2010 1 commit
  31. 29 Sep, 2010 2 commits
    • Mike Hibler's avatar
      5ae92284
    • Mike Hibler's avatar
      Handle a common failure on the node reload path. · 4dc57d48
      Mike Hibler authored
      Under load, nodes that have just entered reloading and have just rebooted
      might fail to get bootinfo.  The default behavior in this case is for the
      node to boot from disk (dubious, but that is the topic for another day).
      This causes the node to fall off the RELOAD path, winding up in either
      TBFAILED or ISUP.  Worse, if the node makes it to ISUP, its reload state
      is cleared and even if the reload_daemon reboots the node, it will still
      not go through the reloading process.
      
      The result is a bunch of nodes left in reloading.  Now if a node makes an
      invalid transition to TBFAILED or ISUP while in the RELOAD state machine,
      it fires the new REBOOT trigger which does...well, you figure it out.
      Note that in the ISUP case, this trigger overrides the default that would
      otherwise clear the reload state--so reboot is sufficient to get the machine
      back on the RELOAD track.
      4dc57d48
  32. 28 May, 2010 1 commit
  33. 26 May, 2010 1 commit
  34. 25 May, 2010 1 commit
  35. 20 May, 2010 2 commits
    • Robert P Ricci's avatar
      Add a new timeout action; STATE · 159552a3
      Robert P Ricci authored
      Allows the state_timeouts table to contain a new type of action
      to take on timeout: STATE:newstate . This will force stated to
      transition the node to newstate, and take any trigger actions
      associated with that state.
      
      We will use this to make timeouts in the secure reload path
      force the node into the SECVIOLATION state.
      
      Not yet tested.
      159552a3
    • Robert P Ricci's avatar
      Add two new triggers for secure boot · 9f5a312f
      Robert P Ricci authored
      Add 'POWEROFF' and 'EMAILNOTIFY' state triggers - the idea is that
      these will be used as triggers when a node enters the 'SECVIOLATION'
      state in the secure reload path, to turn off the node and send
      testbed-ops mail about it.
      
      Not yet tested.
      9f5a312f
  36. 05 Jan, 2010 1 commit