1. 28 Mar, 2006 2 commits
  2. 27 Mar, 2006 4 commits
    • Kirk Webb's avatar
      · 6adad0d6
      Kirk Webb authored
      Minor bugfix.
      6adad0d6
    • Mike Hibler's avatar
      ac33e3fd
    • Kevin Atkinson's avatar
      · d8625ddd
      Kevin Atkinson authored
      Change the email going to testbed-errors from:
      
        From: user
        To: user
        Cc: testbed-errors
      
      to
      
        From: testbed-ops
        To: user
        Bcc: testbed-errors
        X-NetBed-Cc: testbed-errors
      
      This should cause all replies to these message to go to testbed-ops
      instead of testbed-errors.  The only thing is that you need to be
      careful when replying since if you only reply to the sender than it
      will go to testbed-ops only and not to the user.  I was also thinking
      about changing the other swap* failures messages still going to
      testbed-ops in a similar way but due to the reply issue for those on
      testbed-ops I will hold off on that for now unless someone else thinks
      it's a good idea.
      
      The addition of the X-NetBed-Cc header is to make filtering the email
      easier since the fact that it is going to testbed-errors will no
      longer be in the header.  This header is also present in the swap*
      failures messages still going to testbed-ops.
      d8625ddd
    • Timothy Stack's avatar
      Source the nstb_compat.tcl from the ns2ir directory instead of · 34392a13
      Timothy Stack authored
      installing a second one.
      34392a13
  3. 24 Mar, 2006 2 commits
    • Kirk Webb's avatar
      · 623b9be0
      Kirk Webb authored
      Fix up plab renewal for current scheme.  We must now go out to each sliver
      individually and ask for it to be renewed.  I've added a new script to be
      run out of cron that will run through and attempt to renew all active slices.
      If a node cannot be renewed with its slice and comes sufficiently close to
      it's recorded expiration (currently two weeks out), mail will be sent to
      testbed ops warning about this situation.  Note that there will only be
      one email message per slice, containing a list of all nodes at risk for
      expiration.  The plab renewal daemon will no longer run as a result of
      this change.  Note that this is sort of a hack.  The better way would be
      to have the daemon perisistenly try to renew nodes that have failed until
      success, but that will take more work, and I might farm it off to the
      plab monitoring daemon anyway.
      623b9be0
    • Kevin Atkinson's avatar
      · bcbd18aa
      Kevin Atkinson authored
      Hhave swapexp/batchexp dump the error when the -w" option is
      specified.  The error will look something like:
      
        ERROR:: <cause desc>
      
        <text of the error>
      
        Cause: <cause>
        Confidence: <confidence>
      
      This will be the last thing printed.  The "::" is there to make
      recognizing the error easy to scripts since they can just look for the
      "ERROR::".
      bcbd18aa
  4. 23 Mar, 2006 1 commit
    • Kevin Atkinson's avatar
      · d3ca9c2d
      Kevin Atkinson authored
      Change @TBBASE@ to @TBDOCBASE@ when refering to KB entry.
      d3ca9c2d
  5. 22 Mar, 2006 1 commit
  6. 21 Mar, 2006 2 commits
    • Kevin Atkinson's avatar
      · 1fa07472
      Kevin Atkinson authored
      Fix bug causing strange errors from snipit due to an invalid assumtion
      about __DIE__ handler in libtblog.pm.in.
      1fa07472
    • Kevin Atkinson's avatar
      · d258dde6
      Kevin Atkinson authored
      Changed format of email sent to user on errors.  The error will now
      appear instead of the generic message when I am confident it is
      accurate.  The subject line will also change to reflect the cause of
      an error.
      
      Avoid sending mail to testbed-ops during failed swap related evenets
      in some cases.  It will instead be sent to a new mailing list
      testbed-errors.
      
      Added a new row in the experiment info table "Last Error:" which
      states the cause of the error, and links to a new page displaying the
      error.
      
      Made some assign/assign_wrapper errors more informative.
      
      The error (as determined by tblog) is now stored in the database in a
      more structured fashion.  This inlcudes adding a column for the session
      (in the log table) to testbed_stats to link eash swap event with the
      logs and possible the error.
      
      Other changes to the database, see sql/database-migrate.txt
      d258dde6
  7. 20 Mar, 2006 1 commit
  8. 16 Mar, 2006 2 commits
  9. 15 Mar, 2006 2 commits
    • Kirk Webb's avatar
      · 3b7a67ce
      Kirk Webb authored
      Pass experiment description through to Planetlab as the slice description
      when creating a slice.  Also mention the Planetlab AUP and importance of
      providing an accurate slice description in the documentation.  The ez
      interface also briefly mentions the importance of the slice description.
      3b7a67ce
    • Leigh B. Stoller's avatar
      Bug fix; make sure the name of mailman list is correct when removing · 53f7c14d
      Leigh B. Stoller authored
      the whole project.
      53f7c14d
  10. 13 Mar, 2006 4 commits
    • Kirk Webb's avatar
      · 99a5bd48
      Kirk Webb authored
      Use a file to manage the list of ignored/allowed nodes instead of hard coding
      them into the source.  The files are:
      
      @prefix@/etc/plab/{IGNOREDNODES|ALLOWEDNODES}
      99a5bd48
    • Kevin Atkinson's avatar
      · 6e488e77
      Kevin Atkinson authored
      Added inline POD documentation of libtblog.
      
      TODO: Automatically generate HTML page from and and have it installed
      with the other HTML docs.
      6e488e77
    • Leigh B. Stoller's avatar
      A set of changes to run "prepare" on a node just prior to an image · d8f8f9b4
      Leigh B. Stoller authored
      being taken.
      
      The basic strategy is to have node_reboot (when -p option supplied)
      invoke a special command on the node that will cause the shutdown
      procedure to run prepare as it goes single user, but before the
      network is turned off and the machine rebooted. The output of the
      prepare run is capture and send back via the tmcd BOOTLOG command and
      stored in the DB, so that create_image can dump that to the logfile
      (so that the person taking the image can know for certain that the
      prepare ran and finished okay).
      
      On linux this is pretty easy to arrange since reboot is actually
      shutdown and shutdown runs the K scripts in /etc/rc.d/rc6.d, and at
      the end the node is basically single user mode. I just added a new
      script to run prepare and send back the output.
      
      On FreeBSD this is a lot harder since there are no decent hooks.
      Instead, I had to hack up init (see tmcd/freebsd/init/{4,5,6}) with
      some simple code that looks for a command to run instead of going to a
      single user shell. The command (script) runs prepare, sends the output
      back to tmcd, and then does a real reboot.
      
      Okay, so how to get -p passed to node_reboot? I hacked up the
      libadminmfs code slightly to do that, with new 'prepare' argument
      option. This may not be the best approach; might have to do this as a
      real state transition if problems develop. I will wait and see.
      
      Also, I changed www/loadimage.php3 to spew the output of the
      create_image to the browser.
      d8f8f9b4
    • Mike Hibler's avatar
      Reduce power cycle/on batch size when booting into the admin MFS because: · 66dfc7a3
      Mike Hibler authored
       * admin MFS is larger and had more problems with simultaneous reboots
      
       * power command did not support batching anyway (only node_reboot), so
         power ons were performed enmasse, exacerbating problems
      66dfc7a3
  11. 10 Mar, 2006 2 commits
    • Kirk Webb's avatar
      · c94fe76c
      Kirk Webb authored
      Fix stupid little mistake in last commit.
      c94fe76c
    • Kirk Webb's avatar
      · de3525be
      Kirk Webb authored
      Change version check code to throw exception if it can't contact the
      remote pl_conf.
      de3525be
  12. 07 Mar, 2006 4 commits
  13. 03 Mar, 2006 1 commit
    • Timothy Stack's avatar
      · 66ee32fc
      Timothy Stack authored
      Clear out plab evproxy subscriptions when the event scheduler is stopped.
      
      	* event/sched/event-sched.c: Add an __ns_teardown sequence that
      	can be used to send events when the scheduler is stopped.
      
      	* event/tbgen/tevc.c: Add a timeout flag that can be used to bound
      	the time spent waiting for an event to complete.
      
      	* tbsetup/eventsys.proxy.in: When stopping/replaying, run the
      	__ns_teardown sequence and wait for it to complete.
      66ee32fc
  14. 02 Mar, 2006 1 commit
  15. 01 Mar, 2006 1 commit
  16. 23 Feb, 2006 2 commits
  17. 22 Feb, 2006 1 commit
  18. 21 Feb, 2006 1 commit
    • Leigh B. Stoller's avatar
      Neuter the perltie stuff under perl 5.8 since it does not work properly. · 85488e5b
      Leigh B. Stoller authored
      I got close to getting it to work by adding this:
      
      	sub FILENO  { my $this = shift; fileno($$this) }
      	sub CLOSE   { my $this = shift; close($$this) }
      
      	sub OPEN {
      	    my $this = shift;
      
      	    close($$this) if defined(fileno($$this));
      	    @_ == 1 ? open($$this, $_[0]) : open($$this, $_[0], $_[1]);
      	}
      
      But subprocesses were not seeing the right stdout/stderr after doing
      something like:
      
      	open(STDERR, ">> $logname");
      	open(STDOUT, ">> $logname");
      
      So, I will let Kevin work on it; I've spent too much time on it
      already!
      85488e5b
  19. 16 Feb, 2006 6 commits