• Leigh B. Stoller's avatar
    Changes for setting up jailed nodes, which need checks similar to what · 5ab15776
    Leigh B. Stoller authored
    real nodes get. Also, run a proper os_select on jailed nodes, *after*
    the os for the physical node is setup, since otherwise stated will not
    be happy.
    Fixes for dealing with failed os_load. Previously, if os_load would
    fail, os_setup would wait for those nodes anyway since it had no idea
    what nodes had failed (and we do not want to just quit from os_setup
    since that might cause a lot of extra power cycles). Now, for each
    node that got an os_load, check its eventstate; it should be in ISUP
    immediately after os_load exits (since thats what os_load waited for),
    and if its not, then mark that node as failed. Note though that failed
    loads no longer result in the node going into hwdown, since 99 percent
    of the time its a busted user image, not a hardware problem. I figure
    we will catch real hw errors via the reload daemon, when it sends
    email about nodes not finishing.
    Do not bother with doing the vnode setup if any of the phys nodes
    failed to setup. Leads to cascading errors and prolongs the angony by
    another few minutes. Might revisit this later.
    Remove local WaitTillAlive() function, and switch to using the version
    I put into libdb a couple of weeks ago.
    Fix up a bunch of print statements to be nicer.