1. 14 Oct, 2016 1 commit
    • Leigh Stoller's avatar
      Attempt to address the problem described in issue #166; that nodes fail · 5d7164b3
      Leigh Stoller authored
      to go from PXEBOOTING (pxewakeup) to actually booting, but we do not
      know that for a really long time cause we send a BOOTING event from
      bootinfo right after PXEBOOTING, since that was the only place to hook
      it in. Well Mike discovered the "on commit" support in dhcpd, and so
      that is what we are going to use now. Note that uboot nodes have been
      using on commit, now all nodes will when BOOTINFO_EVENTS=0.
      
      Mike's reportboot program is now a daemon, renamed to report_daemon.
      The original reportboot program is a little script that writes the
      arguments from dhcpd to a unix socket to be picked up by the daemon,
      which does the original work of mapping the IP/Mac to a node id and
      sending an event. The code has also been modified to run on a subboss
      using the same node mapping given to to dhcpd, reconstituted as DBM
      file by subboss_dhcpd_makeconf.
      
      The reason for using a daemon this way is so that we do not hang up
      dhcpd in case we cannot get to the event system. The unix domain
      socket will give us some amount of buffering, but I suspect that any
      event problem will eat that space up quickly, and I will be back to
      revisit this (probably want reportboot to not block on its write
      to the socket).
      
      pxeboot changed to not send PXEBOOTING or BOOTING when
      BOOTINFO_EVENTS=0.
      5d7164b3
  2. 10 May, 2016 1 commit
  3. 07 Dec, 2014 1 commit
  4. 03 Dec, 2014 1 commit
    • Mike Hibler's avatar
      Report both PXEBOOTING/BOOTING events on PXE-originated DHCP request. · 25f775fe
      Mike Hibler authored
      A concession to performance. Previously, PXEBOOTING was reported on
      the PXE DHCP request and BOOTING on the following OS-originated request.
      This is conceptually ideal, as that is what those states were intended
      to mean, but it causes two synchronous "reportboot" command executions
      from dhcpd for every node boot. Worse, the time gap between the second,
      OS-originated DHCP call and the first explicit state reported by the
      node itself (e.g., TBSETUP or RELOADSETUP) is generally small enough
      that the node reported state often arrived at stated before the BOOTING
      state from dhcpd. This can cause excess node reboots or other undesirable
      behaviors from stated.
      
      So now we only invoke "reportboot" on the first PXE-originated call and
      tell reportboot to send both PXEBOOTING and BOOTING events at that time.
      This does not eliminate the race condition above, but makes it unlikely
      as there is the whole kernel boot process (10s of seconds) between the
      dhcpd state reports and the first node state report.
      25f775fe
  5. 25 Nov, 2014 2 commits
  6. 23 Nov, 2014 1 commit