Controlling `stated` initiated reboots
We need to have some way of temporarily putting nodes into an "always up" mode so that stated
will not try to reboot them at inopportune times, e.g., while the BIOS is being updated.
TL;DR:
Stated
was designed from the beginning to have a reboot "trigger" action that would allow it to reboot a node if it timed-out in a particular state. However, for the first 15 or so years, that particular action was commented out, it would just say that it was rebooting a node but not really doing it.
Enter @mike, who a year or so ago decided that stated
should walk-the-walk and stop being all talk and no action, and uncommented the reboot. For the most part this is a good thing, but if we have to boot a node into the BIOS for an extended period, or boot it from a dongle or CD with something that is not one of our images, then we can get into a state where stated
thinks the node is booting and starts the timer ticking for it to make progress. It will then proceed, at 10 minute or so intervals depending on the state, to reboot the node up to three times.
You can HUP stated
and have it re-read the DB, so it is possible to, say, manually change the DB to put a node into the ALWAYSUP op_mode. But HUPing stated
tends to bring a lot of skeletons out of the closet as it resets timers and starts monitoring old nodes (e.g., wireless nodes) that it had long ago given up on. This is mostly harmless and generally only results in email messages that bring back a feeling of nostalgia in the reader.