-
Mike Hibler authored
Under load, nodes that have just entered reloading and have just rebooted might fail to get bootinfo. The default behavior in this case is for the node to boot from disk (dubious, but that is the topic for another day). This causes the node to fall off the RELOAD path, winding up in either TBFAILED or ISUP. Worse, if the node makes it to ISUP, its reload state is cleared and even if the reload_daemon reboots the node, it will still not go through the reloading process. The result is a bunch of nodes left in reloading. Now if a node makes an invalid transition to TBFAILED or ISUP while in the RELOAD state machine, it fires the new REBOOT trigger which does...well, you figure it out. Note that in the ISUP case, this trigger overrides the default that would otherwise clear the reload state--so reboot is sufficient to get the machine back on the RELOAD track.
4dc57d48