Skip to content
  • Leigh B. Stoller's avatar
    A couple of changes that attempt to cut short the waiting when · 28ac96a5
    Leigh B. Stoller authored
    a node has failed.
    
    * In the main wait loop, I check the eventstate for the node, for
      TBFAILED or PXEFAILED. Neither of these should happen after the
      reboot, so it makes sense to quit waiting if they do.
    
    * I added an event handler to libosload, specifically to watch for
      nodes entering RELOADSETUP or RELOADING, after the reboot. Because
      of the race with reboot, this was best done with a handler instead
      of polling the DB state like case #1 above. The idea is that a node
      should hit one of these two states within a fairly short time (I
      currently have it set to 5 minutes). If not, something is wrong and
      the loop bails on that node. ÊWhat happens after is subject to the
      normal waiting times.
    
    I believe that these two tests will catch a lot of cases where osload
    is waiting on something that will never finish.
    28ac96a5