1. 14 Nov, 2016 1 commit
    • Mike Hibler's avatar
      Ignore state transitions from NORMALv2/ISUP -> BOOTING. · 990fcc0b
      Mike Hibler authored
      In the !BOOTINFO_EVENTS world, someone making a random DHCP request would
      cause a state transition to BOOTING which would start a timeout ticking and
      most likely would timeout in a couple of minutes and reboot the node.
  2. 20 Oct, 2016 1 commit
  3. 18 Oct, 2016 2 commits
    • Mike Hibler's avatar
    • Mike Hibler's avatar
      Actually reboot nodes that hit the REBOOT trigger. · 6a090aef
      Mike Hibler authored
      This partially undoes the "temporary" change that Mac put in 13 years ago.
      The lack of an actual reboot does cause us grief. In particular, the case
      of NORMALv2/BOOTING when a PXEWAKEUP at swap in is unsuccessful. We have
      seen this when IPMI SOL issues have caused the console and OS to hang up in
      the post-wakeup boot process or if the PXEWAKEUP is lost. Since there is
      only the overarching swapin timeout at this point, and that is typically
      quite large, we'll risk a bad timeout interaction (which was the reason for
      the "temporary" change.
  4. 06 Oct, 2016 1 commit
  5. 03 Feb, 2016 1 commit
    • Leigh B Stoller's avatar
      Add support for multiple pre-reservations per project: · 103e0385
      Leigh B Stoller authored
      When creating a pre-reserve, new -n option to specify a name for the
      reservation, defaults to "default". All other operations require an
      -n option to avoid messing with the wrong reservation. You are not allowed
      to reuse a reservation name in a project, of course. Priorities are
      probably more important now, we might want to change the default from 0 to
      some thing higher, and change all the current priorities.
      For bookkeeping, the nodes table now has a reservation_name slot that is
      set with the reserved_pid. This allows us to revoke the nodes associated
      with a specific reservation. Bonus feature is that when setting the
      reserved_pid via the web interface, we leave the reservation_name null, so
      those won't ever be revoked by the prereserve command line tool.
      New feature; when revoking a pre-reserve, we now look to see if nodes being
      revoked are free and can be assigned to other pre-reserves. We used to not
      do anything, and so had to wait until that node was allocated and released
      later, to see if it could move into a pre-reserve.
      Also a change required by node specific reservations; when we free a node,
      need to make sure we actually use that node, so have to cycle through all
      reservations in priority order until it can used. We did not need to do
      this before.
  6. 01 Dec, 2014 1 commit
  7. 25 Nov, 2014 1 commit
  8. 19 Nov, 2014 1 commit
  9. 11 Nov, 2014 1 commit
    • Kirk Webb's avatar
      More TaintState management updates. · d24df9d2
      Kirk Webb authored
      * Do not "reset" taint states to match partitions after OS load.
      Encumber node with any additional taint states found across the
      OSes loaded on a node's partitions (union of states).  Change the
      name of the associated Node object method to better represent the
      * Clear all taint states when a node exits "reloading"
      When the reload_daemon is finished with a node and ready to release it,
      it will now clear any/all taint states set on the node.  This is the
      only automatic way to have a node's taint states cleared.  Users
      cannot clear node taint states by os_load'ing away all tainted
      partitions after this commit; nodes must travel through reloading
      to get cleared.
  10. 01 Jul, 2014 1 commit
  11. 17 Mar, 2014 1 commit
  12. 26 Feb, 2013 1 commit
  13. 22 Feb, 2013 1 commit
    • Mike Hibler's avatar
      More minor speed ups for stated. · 461a1fce
      Mike Hibler authored
      Log would be mail messages in stated-mail.log rather than actually emailing them.
      Fewer regular log messages.
      Avoid scanning a list unnecessarily if not in debug mode.
      Use mysql to pick out certain osfeatures.
      Bug fix: typo would let stated block when sent a SIGUSR1.
  14. 18 Nov, 2012 1 commit
  15. 27 Sep, 2012 1 commit
  16. 24 Sep, 2012 1 commit
    • Eric Eide's avatar
      Replace license symbols with {{{ }}}-enclosed license blocks. · 6df609a9
      Eric Eide authored
      This commit is intended to makes the license status of Emulab and
      ProtoGENI source files more clear.  It replaces license symbols like
      "EMULAB-COPYRIGHT" and "GENIPUBLIC-COPYRIGHT" with {{{ }}}-delimited
      blocks that contain actual license statements.
      This change was driven by the fact that today, most people acquire and
      track Emulab and ProtoGENI sources via git.
      Before the Emulab source code was kept in git, the Flux Research Group
      at the University of Utah would roll distributions by making tar
      files.  As part of that process, the Flux Group would replace the
      license symbols in the source files with actual license statements.
      When the Flux Group moved to git, people outside of the group started
      to see the source files with the "unexpanded" symbols.  This meant
      that people acquired source files without actual license statements in
      them.  All the relevant files had Utah *copyright* statements in them,
      but without the expanded *license* statements, the licensing status of
      the source files was unclear.
      This commit is intended to clear up that confusion.
      Most Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the Affero GNU General Public License, version 3
      Most Utah-copyrighted files related to ProtoGENI are distributed under
      the terms of the GENI Public License, which is a BSD-like open-source
      Some Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the GNU Lesser General Public License, version 2.1
  17. 23 Sep, 2012 1 commit
  18. 01 Aug, 2012 1 commit
    • Mike Hibler's avatar
      Support 64-bit FreeBSD on the server side. · 9036d314
      Mike Hibler authored
      NOTE: currently only for FreeBSD 7.3 installs because that is the only
      set of boss/ops/fs packages I have built so far!
      This mostly involved minor changes to event agents. Too often we were
      passing a pointer to a "long" to *get_int32, which on a 64-bit x86 OS would
      fill the wrong half of a 64-bit variable. There was also one instance of
      TCL code that had to be tweaked to account for 32- vs 64-bit.
      These changes also required regeneration of SWIG stubs and an ugly change
      to the SWIG generated code to use va_copy rather than direct assignment in
      a couple of places.
      Also related to SWIG is ensuring that the components that go into the
      perl/python stub .so files are built with PIC. The amd64 linker requires
      The meta-ports had to be changed to reflect that linuxthreads and
      ulsshxmlrpcpp don't work on amd64. The former had little effect as we
      had mostly eliminated uses of linuxthreads already. The one thing that
      did change was that we do not build nfstrace on amd64 (and we don't
      currently use this anyway). Removing ulsshxmlrpcpp required switching
      to the new event scheduler (event/new_sched) that Ryan did awhile back.
      Note that it is only "new" in the sense that it uses a standard XMLRPC
      package, there should be no functional differences. However, to be safe
      we only use new_sched as the standard scheduler on 64-bit server installs.
      Finally, added support to elabinelab setup to do a 64-bit server install.
      Just specify FBSD73-64-STD as the boss/ops/fs osid and rc.mkelab should
      do the rest.
      That is pretty much it other than some random nits here and there.
  19. 04 Apr, 2012 1 commit
  20. 02 Apr, 2012 1 commit
  21. 14 Mar, 2012 1 commit
    • Mike Hibler's avatar
      Make the secure boot path work with PXEWAIT. · ceeede28
      Mike Hibler authored
      When a node with the secure boot dongle is freed, it goes into PXEWAIT in
      the context of the secure MFS. Previously we remained in "secure mode"
      (i.e., did not terminate with a TPMSIGNOFF) while a node was in this state.
      If the next use of the node, just booted from the OS that was already on
      the disk, then we never signed off properly.
      Now we sign off before entering PXEWAIT. I thought that this would be the
      easiest alternative to fixing the problem..HaHaHa..not! Because now we have
      to restart the secure boot path (i.e., reboot) if the result of coming out
      of PXEWAIT is a request to reload the disk (i.e., if we are continuing the
      secure disk load path).
      Ideally this would have required only modifications to the state machines
      for SECUREBOOT/LOAD, but as you can see by the presence of stated.in in the
      modified files, this was not the case. The change required some additional
      "finesse" to get it working. See the comments in stated.in and bootinfo_mysql.c
      if you really care.
  22. 19 Jan, 2012 1 commit
  23. 10 Nov, 2011 1 commit
  24. 30 Aug, 2011 1 commit
  25. 23 Aug, 2011 1 commit
  26. 17 Aug, 2011 1 commit
  27. 11 Aug, 2011 1 commit
    • Mike Hibler's avatar
      Initial support for loading Windows7 .wim images via WinPE/ImageX. · ac711ea5
      Mike Hibler authored
      1. Support for "one-shot" PXE booting ala the one-shot osid. Switches to
         pxelinux to boot WinPE and then switch back after done. Painful now
         because we have to HUP dhcpd everytime we change the PXE path, but we
         may be able to fix this in the future by going all-pxelinux-all-the-time.
      2. Added pxe_select, analogous to os_select, for changing the pxe_boot_path
         including the one time path.
      3. Added the WIMRELOAD state machine to shepherd a node through the process.
         Still has some rough edges and may need refining.
  28. 28 Jul, 2011 1 commit
  29. 13 Jul, 2011 1 commit
  30. 27 Jun, 2011 1 commit
  31. 22 Jun, 2011 1 commit
    • Mike Hibler's avatar
      When forcing a transition to a new opMode, look for a valid next state. · 2b3fd82a
      Mike Hibler authored
      Previously, a forced opModeTransition would just remain in the same state
      after moving to the new op_mode rather than looking for a valid
      oldmode/oldstate => newmode/newstate transition in the mode_transitions
      table. This should only effect the transition from SECUREBOOT/TPMSIGNOFF,
      since all other uses should not find a valid newstate and should remain in
      the old state as before.
  32. 20 Jun, 2011 1 commit
    • Mike Hibler's avatar
      If stated were a space probe it would have crashed into Mars... · c88f89f3
      Mike Hibler authored
      Minor units conversion problem here. IO::Poll() takes seconds as it argument,
      not milliseconds as we were doing (by multiplying the arg by 1000 before
      Unfortunately, this is not the Big One (memory corruption) that we have been
      chasing for so long. Sigh...
      (cherry picked from commit 1ee85494)
  33. 13 Jun, 2011 1 commit
    • Mike Hibler's avatar
      If stated were a space probe it would have crashed into Mars... · 1ee85494
      Mike Hibler authored
      Minor units conversion problem here. IO::Poll() takes seconds as it argument,
      not milliseconds as we were doing (by multiplying the arg by 1000 before
      Unfortunately, this is not the Big One (memory corruption) that we have been
      chasing for so long. Sigh...
  34. 02 Jun, 2011 1 commit
  35. 11 May, 2011 1 commit
  36. 10 May, 2011 1 commit
    • Leigh B Stoller's avatar
      Gack, must call "select STDOUT" after the reopen operation, since we · 84a6e9fe
      Leigh B Stoller authored
      used "select STDERR" to change the line buffering. The result was that
      after the log roll, the child was printing to STDERR instead of
      STDOUT, and so the parent never saw any new events.
      Note that USR1 (re-exec binary) does not work since exec bypasses the
      END block, and things get messed up. Not fixed yet.
  37. 13 Mar, 2011 1 commit
  38. 25 Feb, 2011 1 commit
    • Mike Hibler's avatar
      Fix some nagging bugs. · 85d8986c
      Mike Hibler authored
      We were not processing the timeout queue because we got stuck forever in
      the loop that processed events. Now before looping back to sysread, make
      sure there is something to read so we don't block.
      When we startup or re-read the DB state, ignore really old state timeout
      values; e.g., for nodes that have been dead for ages but happen to be in
      a state such as SHUTDOWN that has a timeout.
      In the main loop, handle any re-read of the DB state before testing the
      queue length to see if we can do a blocking poll. Re-reading the state may
      add timeouts to the queue.
  39. 24 Feb, 2011 1 commit