1. 26 Dec, 2003 1 commit
  2. 09 Oct, 2003 1 commit
    • Leigh B. Stoller's avatar
      Reorg of two aspects of node update. · 2641af4d
      Leigh B. Stoller authored
      * install-rpm, install-tarfile, spewrpmtar.php3, spewrpmtar.in: Pumped
        up even more! The db file we store in /var/db now records both the
        timestamp (of the file, or if remote the install time) and the MD5
        of the file that was installed. Locally, we can get this info when
        accessing the file via NFS (copymode on or off). Remote, we use wget
        to get the file, and so pass the timestamp along in the URL request,
        and let spewrpmtar.in determine if the file has changed. If the
        timestamp it gets is >= to the timestamp of the file, an error code
        of 304 (Not Modifed) is returned. Otherwise the file is returned.
      
        If the timestamps are different (remote, server sends back an actual
        file), the MD5 of the file is compared against the value stored. If
        they are equal, update the timestamp in the db file to avoid
        repeated MD5s (or server downloads) in the future. If the MD5 is
        different, then reinstall the tarball or rpm, and update the db file
        with the new timestamp and MD5. Presto, we have auto update capability!
      
        Caveat: I pass along the old MD5 in the URL, but it is currently
        ignored. I do not know if doing the MD5 on the server is a good
        idea, but obviously it is easy to add later. At the moment it
        happens on the node, which means wasted bandwidth when the timestamp
        has changed, but the file has not (probably not something that will
        happen in typical usage).
      
        Caveat: The timestamp used on remote nodes is the time the tarfile
        is installed (GM time of course). We could arrange to return the
        timestamp of the local file back to the node, but that would mean
        complicating the protocol (or using an http header) and I was not in
        the mood for that. In typical usage, I do not think that people will
        be changing tarfiles and rpms so rapidly that this will make a
        difference, but if it does, we can change it.
      
      * node_update.in, client side watchdog, and various web pages:
        Deflated node_update, removing all of the older ssh code. We now
        assume that all nodes will auto update on a periodic basis, via the
        watchdog that runs on all client nodes, including plab nodes.
      
        Changed the permission check to look for new UPDATE permission (used
        to be UPDATEACCOUNT). As before, it requires local_root or better.
        The reason for this is that node_update now implies more than just
        updating the accounts/mounts. The web pages have been changed to
        explain that in addition to mounts/accounts, rpms and tarfiles will
        also be updated. At the moment, this is still tied to a single
        variable (update_accounts) in the nodes table, but as Kirk requested
        at the meeting, it will probably be nice to split these out in the
        future.
      
        Added the ability to node_update a single node in an experiment (in
        addition to all nodes option on the showexp page). This has been
        added to the shownode webpage menu options.
      
        Changed locking code to use the newer wrapper states, and to move
        the experiment to RUNNING_LOCKED until the update completes. This is
        to prevent mayhem in the rest of the system (which could be dealt
        with, but is not worth the trouble; people have to wait until their
        initiated update is complete, before they can swap out the
        experiment).
      
        Added "short" mode to shownode routine, equiv to the recently added
        short mode for showexp. I use this on the confirmation page for
        updating a single node, giving the user a couple of pertinent (feel
        good) facts before they comfirm.
      2641af4d
  3. 06 Oct, 2003 1 commit
    • Leigh B. Stoller's avatar
      * New libtmcc.pm module that encapsulates the tmcc interface. Most of the · 434a472a
      Leigh B. Stoller authored
        code that was in libsetup has moved into this library, and underwent a
        giant cleaning and pumping up. The interface from your typical perl
        script now looks like this:
      
        use libtmcc;
      
        if (tmcc(TMCCCMD_STATUS, "optional arguments", \@tmccresults) < 0) {
            warn("*** WARNING: Could not get status from server!\n");
            return -1;
        }
        foreach my $me (@tmccresults) {
      	print "bite $me";
        }
      
        The arguments and results are optional values. There is a fourth optional
        value that is a hash of config options (basically converted to command
        line switches passed to tmcc). For example, to set the timeout on an
        individual call, pass a fourth argument like:
      
      	("timeout" => 5)
      
        There is also a way to set global options so that all subsequent tmcc
        calls are affected:
      
      	configtmcc("timeout", 5);
      
        I'll probably clean this up a bit to avoid the direct strings.
      
        The result list is a list of strings. Since we are trending away from
        using tmcc to transfer large amounts of data, I think this is okay.
      
      * A new tmcc.pl which does little more than load libtmcc and use it.
        This will become the new tmcc, with the existing C version becoming a
        backend binary for it.
      
      * All of the perl scripts in tmcd have been changed to use the new
        library. I left the few uses of tmcc in shell scripts alone since they
        were of the simple variety (mostly "state" command).
      
      * And again, if you have read this far, you will learn why I bothered with
        all this. Well, the existing code was really bad and it was getting out
        of control. Sort of like a squid that was getting harder to control as
        its rotting tenticles slithered into more and more scripts. Anyway ...
      
        More important, my goal is to use the libtmcc library to add caching.  I
        have not worked out the details yet, but I am envisioning a configuration
        file, perhaps generated initially by tmcd, of all of the config
        values. If the library finds that file, it sucks the info out of the file
        instead of going to tmcd. Eventually, this config file would be generated
        as part of experiment swapping and stored in the DB, but thats a longer
        term project, and perhaps orthogonal (how we fill the cache is not as
        important as adding the ability to use a cache, right?).
      
        Note that certain operations (like "state" and "ready") are flagged by
        the library to always bypass the "cache".
      434a472a
  4. 03 Sep, 2003 1 commit
  5. 18 Aug, 2003 1 commit
  6. 05 Aug, 2003 1 commit
    • Leigh B. Stoller's avatar
      The rest of the sync server additions: · 212cc781
      Leigh B. Stoller authored
      * Parser: Added new tb command to set the name of the sync server:
      
      	tb-set-sync-server <node>
      
        This initializes the sync_server slot of the experiment entry to the
        *vname* of the node that should run the sync server for that
        experiment. In other words, the sync server is per-experiment, runs
        on a node in the experiment, and the user gets to chose which node
        it runs on.
      
      * tmcd and client side setup. Added new syncserver command which
        returns the name of the syncserver and whether the requesting node
        is the lucky one to run the daemon:
      
          SYNCSERVER SERVER='nodeG.syncserver.testbed.emulab.net' ISSERVER=1
      
        The name of the syncserver is written to /var/emulab/boot/syncserver
        on the nodes so that clients can easily figure out where the server
        is.
      
        Aside: The ready bits are now ignored (no DB accesses are made) for
        virtual nodes; they are forced to use the new sync server.
      
      * New os/syncd directory containing the daemon and the client. The
        daemon is pretty simple. It waits for TCP (and UDP, although that
        path is not complete yet) connections, and reads in a little
        structure that gives the name of the "barrier" to wait for, and an
        optional count of clients in the group (this would be used by the
        "master" who initializes barriers for clients). The socket is saved
        (no reply is made, so the client is blocked) until the count reaches
        zero. Then all clients are released by writting back to the
        sockets, and the sockets are closed. Obviously, the number of
        clients is limited by the numbed of FDs (open sockets), hence the
        need for a UDP variant, but that will take more work.
      
        The client has a simple command line interface:
      
          usage: emulab-sync [options]
          -n <name>         Optional barrier name; must be less than 64 bytes long
          -d                Turn on debugging
          -s server         Specify a sync server to connect to
          -p portnum        Specify a port number to connect to
          -i count          Initialize named barrier to count waiters
          -u                Use UDP instead of TCP
      
          The client figures out the server by looking for the file created
          above by libsetup (/var/emulab/boot/syncserver). If you do not
          specify a barrier "name", it uses an internal default. Yes, the
          server can handle multiple barriers (differently named of course)
          at once (non-overlapping clients obviously).
      
          Clients can wait before a barrier in "initialized." The count on
          the barrier just goes negative until someone initializes the
          barrier using the -i option, which increments the count by the
          count. Therefore, the master does not have to arrange to get there
          "first." As an example, consider a master and one client:
      
      	nodeA> /usr/local/etc/emulab/emulab-sync -n mybarrier
      	nodeB> /usr/local/etc/emulab/emulab-sync -n mybarrier -i 1
      
          Node A waits until Node B initializes the barrier (gives it a
          count).  The count is the number of *waiters*, not including the
          master. The master is also blocked until all of the waiters have
          checked in.
      
          I have not made an provision for timeouts or crashed clients. Lets
          see how it goes.
      212cc781
  7. 05 Jun, 2003 1 commit
  8. 30 Jan, 2003 1 commit
  9. 06 Jan, 2003 1 commit
  10. 18 Dec, 2002 2 commits
    • Leigh B. Stoller's avatar
      Add new watchdog script, derived from the ron version (and will · 482fb815
      Leigh B. Stoller authored
      replace it eventually). Like the ron nodes, local nodes will now
      periodically (once every 5 minutes) send a udp packet to boss to
      indicate the node is alive and to see if it needs to check for account
      updates. This will replace the once every 5 minute fping we do from
      db/node_status (once I whack that script), and will simplify the
      existing problem of propogating accounts to nodes (nodes down, nodes
      in the swapping phase, etc).
      482fb815
    • Leigh B. Stoller's avatar
      group,master.password: Add sshd, smmsp, mailnull, and sfs. · 77661f58
      Leigh B. Stoller authored
      rc.conf: Remove fixed -p argument. Now set by mkjail.
      rc.local,jailctl: Update for client side path reorg and cleanup.
      jaildog.pl,mkjail.pl: Numerous fixes for jailed nodes.
      77661f58
  11. 26 Sep, 2002 1 commit
  12. 27 Aug, 2002 1 commit
  13. 15 Aug, 2002 1 commit
  14. 29 Jul, 2002 1 commit
    • Leigh B. Stoller's avatar
      A wide array of little changes to improve the distribution of the · f066d2d9
      Leigh B. Stoller authored
      client software to widearea nodes. Most of these changes were to
      reduce the embarrassment factor. At some point we need a proper
      autoconf and such, but for now there is a makefile in the src dir for
      creating the distribution.
      
      I've tested it on a local linux node and mostly on a freebsd node, but
      I've moved things around and so updating the RON nodes will require
      some hand intervention by me at some point.
      f066d2d9
  15. 10 Jul, 2002 1 commit
  16. 07 Jul, 2002 1 commit
  17. 02 Jul, 2002 2 commits
  18. 19 Jun, 2002 1 commit
    • Leigh B. Stoller's avatar
      Add a 60 second timer to tell Emulab that the node is alive and · 6b75cf38
      Leigh B. Stoller authored
      well. We use a UDP packet to keep it lightweight. If it does not get
      through, thats okay, obviously. The return value is just a yes/no flag
      that says an update needs to run. Right now, thats just accounts.
      This allows us to churn a little less on accounts.
      Other cleanups.
      6b75cf38
  19. 06 Jun, 2002 1 commit
  20. 31 May, 2002 1 commit
    • Leigh B. Stoller's avatar
      New watchdog daemon for remote (RON) nodes. Okay, not much of a · 6c44b4a4
      Leigh B. Stoller authored
      watchdog at the moment, but it will be. Right now it does boot time
      stuff; issues tmcc state event so the testbed knows (REBOOTED), does
      an account update to get any accounts missed while dead, then sets up
      and vnodes (tunnels and such) that where supposed to be running on
      the node, then issues a tmcc ISUP event.
      
      After that, goes into a loop doing periodic account update. At some
      point it would be good to look for stale vnodes (that could not be
      torn down because of network connectivity problems), but there are
      some race conditions that I need to work out first.
      6c44b4a4