Skip to content
  • Leigh B. Stoller's avatar
    Try and be smarter for nodes to die by looping in short pings waiting · 63eabd05
    Leigh B. Stoller authored
    for no more replies. Still not great, and this causes the loop to reboot
    all the machines to get kinda long.
    More important is that we have to wait until all the nodes reboot and come
    back so that the next part tbrun does not fail. That adds a bunch of time
    to this. Needs to parallelize the reboot and wait, but thats too hard too
    deal with right now.
    63eabd05