1. 30 Aug, 2004 1 commit
    • Leigh B. Stoller's avatar
      The bulk of the event system changes. · 9aa6b5ca
      Leigh B. Stoller authored
      * The per-experiment event scheduler now runs on ops instead of boss.
        Boss still runs elvind and uses events internally, but the user part
        of the event system has moved.
      
      * Part of the guts of eventsys_control moved to new script, eventsys.proxy,
        which runs on ops and fires off the event scheduler. The only tricky part
        of this is that the scheduler runs as the user, but killing it has to be
        done as root since a different person might swap out the experiment. So,
        the proxy is a perl wrapper invoked from a root ssh from boss, which
        forks, writes the pid file into /var/run/emulab/evsched/$pid_$eid.pid,
        then flips to the user and execs the event scheduler (which is careful
        not to fork). Obviously, if the kill is done as root, the pid file has to
        be stored someplace the user is not allowed to write.
      
      * The event scheduler has been rewritten to use Tim's C++ interface to the
        sshxmlrpc server on boss. Actually, I reorg'ed the scheduler so that it
        can be built either as a mysql client, or as RPC client. Note that it can
        also be built to use the SSL version of the XMLRPC server, but that will
        not go live until I finish the server stuff up. Also some goo for dealing
        with building the scheduler with C++.
      
      * Changes to several makefiles to install the ops binaries over NFS to
        /usr/testbed/opsdir. Makes life easier, but only if boss and ops are
        running the same OS. For now, using static linking on the event scheduler
        until ops upgraded to same rev as boss.
      
      * All of the event clients got little tweaks for dealing with the new CNAME
        for the event system server (event-sever). Will need to build new images
        at some point. Old images and clients will continue to work cause of an
        inetd hack on boss that uses netcat to transparently redirect elvind
        connections to ops.
      
      * Note that eventdebug needs some explaining. In order to make the inetd
        redirect work, elvind cannot be listening on the standard port. So, the
        boss event system uses an alternate port since there are just a few
        subsystems on boss that use the server, and its easy to propogate changes
        on boss. Anyway, the default for eventdebug is to connect to the standard
        port on localhost, which means it will work as expected on ops, but will
        require -b argument on boss.
      
      * Linktest changes were slightly more involved. No longer run linktest on
        boss when called from the experiment swapin path, but ssh over to ops to
        fire it off. This is done as the user of course, and there are some
        tricks to make it possible to kill a running linktest and its ssh when
        experiment swapin is canceled (or from the command line) by forcing
        allocation of a tty. I will probably revisit this at some point, but I
        did not want to spend a bunch of time on linktest.
      
      * The upgrade path detailed in doc/UPDATING is necessarily complicated and
        bound to cause consternation at remote sites doing an upgrade.
      9aa6b5ca
  2. 29 Jun, 2004 1 commit
    • Leigh B. Stoller's avatar
      Some "improvements" to linktest ... · 159076bf
      Leigh B. Stoller authored
      * The linktest daemon (the one that runs on the nodes) no longer talks
        to boss directly, but instead contacts the local elvind; rc.linktest
        is changed to reflect that.
      
      * A bunch of signal handler changes to run_linktest.pl; do not rely on
        events to stop linktest when it is running on boss; when the user
        kills a running linktest make sure all the processes are killed
        explicitly.
      
      * New wrapper script (linktest_control) for use on boss, specifically
        when being called from the web interface. This script handles the DB
        part (getting linktest_level and linktest_pid), making sure that
        only one linktest is running at a time (on boss) and reseting the
        pid in the DB as needed. The -k option kills a running linktest, and
        is invoked from the web interface when the user wants to kill one in
        progress. This gets the pid from the DB and sends it a TERM signal,
        which sends a TERM to the run_linktest.pl script, which sends a TERM
        to the ltevent helper app.
      
        Note that this wrapper is also suitable for the XMLRPC interface,
        although I have not added it there yet.
      159076bf