1. 20 Oct, 2006 1 commit
    • Mike Hibler's avatar
      Wow, this should make me look important! · afa5e919
      Mike Hibler authored
      Two-day boondoggle to support "/scratch", an optional large, shared filesystem
      for users.  To do this, I needed to find all the instances where /proj is used
      and behave accordingly.  The boondoggle part was the decision to gather up all
      the hardwired instances of shared directory names ("/proj", "/users", etc.)
      so that they are set in a common place (via unexposed configure variables).
      This is a boondoggle because:
      
      1. I didn't change the client-side scripts.  They need a different mechanism
         (e.g., tmcd) to get the info, configure is the wrong way.
      
      2. Even if I had done #1 it is likely--no, certain--that something would
         fail if you tried to rename "/proj" to be "/mike".  These names are just
         too ingrained.
      
      3. We may not even use "/scratch" as it turns out.
      
      Note, I also didn't fix any of the .html documentation.  Anyway, it is done.
      To maintain my illusion in the future you should:
      
      1. Have perl scripts include "use libtestbed" and use the defined PROJROOT(),
         et.al. functions where possible.  If not possible, make sure they run
         through configure and use @PROJROOT_DIR@, etc.
      
      2. Use the configure method for python, C, php and other languages.
      
      3. There are perl (TBValidUserDir) and php (VALIDUSERPATH) functions which
         you should call to determine if an NS, template parameter, tarball or
         other file are in "an acceptable location."  Use these functions where
         possible.  They know about the optional "scratch" filesystem.  Note that
         the perl function is over-engineered to handles cases that don't occur
         in nature.
      afa5e919
  2. 03 Oct, 2006 1 commit
  3. 14 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      Rework the event handling for the program agent so that both the · aead4580
      Leigh Stoller authored
      reload and halt events send proper completion events. This is required
      for stoprun and startrun to work correctly. On stoprun the logs are
      not collected until the programs have stopped, and on startrun we do
      not want to proceed until all the agents have reloaded their
      environments.
      aead4580
  4. 10 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      The bulk of this commit adds the ability to run the program agent on ops · e8bb6bca
      Leigh Stoller authored
      so that users can schedule program events to run there. For example:
      
      	set myprog [new Program $ns]
      	$myprog set node "ops"
      	$myprog set command "/usr/bin/env >& /tmp/foo"
      
      	$ns at 10 "$myprog start"
      or
      	tevc -e pid/eid now myprog start
      
      Since the program agent cannot talk to tmcd from ops, there are new
      routines to create the config files that the program agent uses, in
      the expertment tbdata directory.
      
      I also rewrote the eventsys.proxy script that starts the event
      scheduler on ops; I rolled the startup of the program agent into this
      script, via new -a option which is passed over from boss when an ops
      program agent is detected in the virt topology. This keep the number
      of new processes on ops to a small number.
      
      Also part of the above rewrite is that we now catch when event
      scheduler (or the program agent) exits abnormally, sending email to
      tbops and the swapper of the experiment. We have been seeing abnormal
      exits of the scheduler and it would good to detect and see if we can
      figure out what is going wrong.
      
      Other small bug fixes in experiment run.
      e8bb6bca
  5. 08 Jun, 2006 1 commit
  6. 03 Mar, 2006 1 commit
    • Timothy Stack's avatar
      · 66ee32fc
      Timothy Stack authored
      Clear out plab evproxy subscriptions when the event scheduler is stopped.
      
      	* event/sched/event-sched.c: Add an __ns_teardown sequence that
      	can be used to send events when the scheduler is stopped.
      
      	* event/tbgen/tevc.c: Add a timeout flag that can be used to bound
      	the time spent waiting for an event to complete.
      
      	* tbsetup/eventsys.proxy.in: When stopping/replaying, run the
      	__ns_teardown sequence and wait for it to complete.
      66ee32fc
  7. 30 Nov, 2005 2 commits
  8. 26 Apr, 2005 1 commit
  9. 30 Aug, 2004 1 commit
    • Leigh Stoller's avatar
      The bulk of the event system changes. · 9aa6b5ca
      Leigh Stoller authored
      * The per-experiment event scheduler now runs on ops instead of boss.
        Boss still runs elvind and uses events internally, but the user part
        of the event system has moved.
      
      * Part of the guts of eventsys_control moved to new script, eventsys.proxy,
        which runs on ops and fires off the event scheduler. The only tricky part
        of this is that the scheduler runs as the user, but killing it has to be
        done as root since a different person might swap out the experiment. So,
        the proxy is a perl wrapper invoked from a root ssh from boss, which
        forks, writes the pid file into /var/run/emulab/evsched/$pid_$eid.pid,
        then flips to the user and execs the event scheduler (which is careful
        not to fork). Obviously, if the kill is done as root, the pid file has to
        be stored someplace the user is not allowed to write.
      
      * The event scheduler has been rewritten to use Tim's C++ interface to the
        sshxmlrpc server on boss. Actually, I reorg'ed the scheduler so that it
        can be built either as a mysql client, or as RPC client. Note that it can
        also be built to use the SSL version of the XMLRPC server, but that will
        not go live until I finish the server stuff up. Also some goo for dealing
        with building the scheduler with C++.
      
      * Changes to several makefiles to install the ops binaries over NFS to
        /usr/testbed/opsdir. Makes life easier, but only if boss and ops are
        running the same OS. For now, using static linking on the event scheduler
        until ops upgraded to same rev as boss.
      
      * All of the event clients got little tweaks for dealing with the new CNAME
        for the event system server (event-sever). Will need to build new images
        at some point. Old images and clients will continue to work cause of an
        inetd hack on boss that uses netcat to transparently redirect elvind
        connections to ops.
      
      * Note that eventdebug needs some explaining. In order to make the inetd
        redirect work, elvind cannot be listening on the standard port. So, the
        boss event system uses an alternate port since there are just a few
        subsystems on boss that use the server, and its easy to propogate changes
        on boss. Anyway, the default for eventdebug is to connect to the standard
        port on localhost, which means it will work as expected on ops, but will
        require -b argument on boss.
      
      * Linktest changes were slightly more involved. No longer run linktest on
        boss when called from the experiment swapin path, but ssh over to ops to
        fire it off. This is done as the user of course, and there are some
        tricks to make it possible to kill a running linktest and its ssh when
        experiment swapin is canceled (or from the command line) by forcing
        allocation of a tty. I will probably revisit this at some point, but I
        did not want to spend a bunch of time on linktest.
      
      * The upgrade path detailed in doc/UPDATING is necessarily complicated and
        bound to cause consternation at remote sites doing an upgrade.
      9aa6b5ca