-
Leigh B. Stoller authored
real nodes get. Also, run a proper os_select on jailed nodes, *after* the os for the physical node is setup, since otherwise stated will not be happy. Fixes for dealing with failed os_load. Previously, if os_load would fail, os_setup would wait for those nodes anyway since it had no idea what nodes had failed (and we do not want to just quit from os_setup since that might cause a lot of extra power cycles). Now, for each node that got an os_load, check its eventstate; it should be in ISUP immediately after os_load exits (since thats what os_load waited for), and if its not, then mark that node as failed. Note though that failed loads no longer result in the node going into hwdown, since 99 percent of the time its a busted user image, not a hardware problem. I figure we will catch real hw errors via the reload daemon, when it sends email about nodes not finishing. Do not bother with doing the vnode setup if any of the phys nodes failed to setup. Leads to cascading errors and prolongs the angony by another few minutes. Might revisit this later. Remove local WaitTillAlive() function, and switch to using the version I put into libdb a couple of weeks ago. Fix up a bunch of print statements to be nicer.
5ab15776