1. 15 Jan, 2003 5 commits
  2. 14 Jan, 2003 4 commits
  3. 13 Jan, 2003 4 commits
  4. 12 Jan, 2003 1 commit
  5. 10 Jan, 2003 8 commits
  6. 09 Jan, 2003 2 commits
  7. 08 Jan, 2003 8 commits
  8. 07 Jan, 2003 8 commits
    • Mac Newbold's avatar
    • Leigh B. Stoller's avatar
      Changes for setting up jailed nodes, which need checks similar to what · 5ab15776
      Leigh B. Stoller authored
      real nodes get. Also, run a proper os_select on jailed nodes, *after*
      the os for the physical node is setup, since otherwise stated will not
      be happy.
      
      Fixes for dealing with failed os_load. Previously, if os_load would
      fail, os_setup would wait for those nodes anyway since it had no idea
      what nodes had failed (and we do not want to just quit from os_setup
      since that might cause a lot of extra power cycles). Now, for each
      node that got an os_load, check its eventstate; it should be in ISUP
      immediately after os_load exits (since thats what os_load waited for),
      and if its not, then mark that node as failed. Note though that failed
      loads no longer result in the node going into hwdown, since 99 percent
      of the time its a busted user image, not a hardware problem. I figure
      we will catch real hw errors via the reload daemon, when it sends
      email about nodes not finishing.
      
      Do not bother with doing the vnode setup if any of the phys nodes
      failed to setup. Leads to cascading errors and prolongs the angony by
      another few minutes. Might revisit this later.
      
      Remove local WaitTillAlive() function, and switch to using the version
      I put into libdb a couple of weeks ago.
      
      Fix up a bunch of print statements to be nicer.
      5ab15776
    • Robert Ricci's avatar
    • Robert Ricci's avatar
      bab0f654
    • Robert Ricci's avatar
      New script: readycount · 616601b5
      Robert Ricci authored
      Simple command-line interface to the ready bits. Its primary
      purposes are:
      
      * Manually report ready for nodes that can't do it themselves
      * Get a list of which nodes are ready, so that you can figure out
        which one(s) aren't reporting in
      * Clear ready bits so you can use them again without restarting the
        experiment
      * Make it possible to poll ready bits on boss/ops
      616601b5
    • Leigh B. Stoller's avatar
      Minor fix to last revision. · 85d30cce
      Leigh B. Stoller authored
      85d30cce
    • Leigh B. Stoller's avatar
      541df535
    • Leigh B. Stoller's avatar
      Remove hardwired 15 minute wait, and replace with a hardwired · f7b3e7b7
      Leigh B. Stoller authored
      calculation based on the size of the image file. Okay, to avoid all
      you folks from going to see what bit of dreck I came up with, here it
      is:
      
          my $sb     = stat($imagepath);
          my $chunks = $sb->size / (1024 * 1024);
          $maxwait   = int((($chunks / 100.0) * 25) + (4 * 60));
      
      Note the replacement of one hardwired number (15) with several dozen
      new ones!
      
      I like it anyway, cause I hate waiting 2*15 minutes when a 60 second
      load fails.
      f7b3e7b7