-
Leigh B. Stoller authored
a node has failed. * In the main wait loop, I check the eventstate for the node, for TBFAILED or PXEFAILED. Neither of these should happen after the reboot, so it makes sense to quit waiting if they do. * I added an event handler to libosload, specifically to watch for nodes entering RELOADSETUP or RELOADING, after the reboot. Because of the race with reboot, this was best done with a handler instead of polling the DB state like case #1 above. The idea is that a node should hit one of these two states within a fairly short time (I currently have it set to 5 minutes). If not, something is wrong and the loop bails on that node. ÊWhat happens after is subject to the normal waiting times. I believe that these two tests will catch a lot of cases where osload is waiting on something that will never finish.
28ac96a5