1. 13 Oct, 2003 8 commits
    • David Anderson's avatar
      fixed a path · 7dd9853d
      David Anderson authored
    • David Anderson's avatar
      actual linktest perl script · 95f63c29
      David Anderson authored
    • David Anderson's avatar
      updated linktest script to test routing, link connectivity and bandwidth. · 76b3f1ad
      David Anderson authored
      also includes updated tb_compat.tcl include file and ns patch.
    • Leigh B. Stoller's avatar
    • Leigh B. Stoller's avatar
      Update. · ba0cf5b7
      Leigh B. Stoller authored
    • Leigh B. Stoller's avatar
      Aside from another round of cleanup, there is a significant change. · a70aef53
      Leigh B. Stoller authored
      I have implemented the suggestion Jay made a couple of weeks ago
      about allowing partial allocation in assign_wrapper, and retrying with a
      modified set of "fixed" nodes.
      My basic approach was to change nalloc to optionally allow partial
      allocations, returning the number of nodes that could not be allocated as
      its return value. In assign_wrapper, I determine which nodes we were able
      to get (in each loop), set their allocstate to INIT_DIRTY, augment the
      fixed_node set, and recreate the top file. Then I try again, up to the
      current number of maxtries. If assign fails with an unretryable error, or
      if we could not nalloc a user directed fixed node, then I stop right away
      since the experiment is not going to map (in the near term) if the fixed
      node list cannot be allocated.
      I am confident that this works okay, although testing is a little
      difficult. The main problem is how this interacts with experiment modify.
      Chad's implementation is that a modify can be reverted (recovered from)
      only as long as the DB is not modified by assign_wrapper. Well, a partial
      allocation, followed by failure, obviously modifies the DB, and so is
      deemed not recoverable. I am still trying to figure out the effects of
      this, and whether I can relax this requirement, but in the meantime
      lets install it and see what happens (won't affect many people).
    • Leigh B. Stoller's avatar
      Add sfshostid to nodes table. We store that in the filesystem on ops, · 11fde1c3
      Leigh B. Stoller authored
      but its nice to have it in the DB too so that we do not have to read
      that file!
    • Mac Newbold's avatar
      Rollback to prestatewait for now. · 3b210b7b
      Mac Newbold authored
  2. 10 Oct, 2003 7 commits
    • Mac Newbold's avatar
      Fix a nit for Mike. · b71f5f90
      Mac Newbold authored
    • Mac Newbold's avatar
    • Robert Ricci's avatar
    • Leigh B. Stoller's avatar
    • Mike Hibler's avatar
      Make sure it finds the renamed tftp-hpa · 4daef377
      Mike Hibler authored
    • Mac Newbold's avatar
      Add statewait changes · e8b47f26
      Mac Newbold authored
    • Mac Newbold's avatar
      New StateWait changes - the main point of all this is to move to our new · 2b2a306d
      Mac Newbold authored
      model of waiting for state changes. Before we were watching the database
      (which means we can only watch for terminal/stable/long-lived states, and
      have to poll the db). Now things that are waiting for states to change
      become event listeners, and watch the stream of events flow by, and don't
      have to do any polling. They can now watch for any state, and even
      sequences of states (ie a Shutdown followed by an Isup).
      To do this, there is now a cool StateWait.pm library that encapsulates the
      functionality needed. To use it, you call initStateWait before you start
      the chain of events (ie before you call node reboot). Then do your stuff,
      and call waitForState() when you're ready to wait. It can be told to
      return periodically with the results so far, and you can cancel waiting
      for things. An example program called waitForState is in
      testbed/event/stated/ , and can also be used nicely as a command line tool
      that wraps up the library functionality.
      This also required the introduction of a TBFAILED event that can be sent
      when a node isn't going to make it to the state that someone may be
      waiting for. Ie if it gets wedged coming up, and stated retries, but
      eventually gives up on it, it sends this to let things know that the node
      is hozed and won't ever come up.
      Another thing that is part of this is that node_reboot moves (back) to the
      fully-event-driven model, where users call node reboot, and it does some
      checks and sends some events. Then stated calls node_reboot in "real mode"
      to actually do the work, and handles doing the appropriate retries until
      the node either comes up or is deemed "failed" and stated gives up on it.
      This means stated is also the gatekeeper of when you can and cannot reboot
      a node. (See mail archives for extensive discussions of the details.)
      A big part of the motivation for this was to get uninformed timeouts and
      retries out of os_load/os_setup and put them in stated where we can make a
      wiser choice. So os_load and os_setup now use this new stuff and don't
      have to worry about timing out on nodes and rebooting. Stated makes sure
      that they either come up, get retried, or fail to boot. tbrestart also
      underwent a similar change.
  3. 09 Oct, 2003 8 commits
    • Mike Hibler's avatar
      Rename tftp-hpa port to not conflict with official port. · ed3fcd05
      Mike Hibler authored
      We have a few more sources changes then they do, so we cannot just use it.
    • Mike Hibler's avatar
      New node_list command. I opted to put it in this directory as it seemed · afb2350d
      Mike Hibler authored
      the closest match.
    • Mike Hibler's avatar
      Allow new node_list command. · acc97e75
      Mike Hibler authored
    • Leigh B. Stoller's avatar
    • Leigh B. Stoller's avatar
      Add commit message · 4e603cd1
      Leigh B. Stoller authored
    • Leigh B. Stoller's avatar
      Reorg of two aspects of node update. · 2641af4d
      Leigh B. Stoller authored
      * install-rpm, install-tarfile, spewrpmtar.php3, spewrpmtar.in: Pumped
        up even more! The db file we store in /var/db now records both the
        timestamp (of the file, or if remote the install time) and the MD5
        of the file that was installed. Locally, we can get this info when
        accessing the file via NFS (copymode on or off). Remote, we use wget
        to get the file, and so pass the timestamp along in the URL request,
        and let spewrpmtar.in determine if the file has changed. If the
        timestamp it gets is >= to the timestamp of the file, an error code
        of 304 (Not Modifed) is returned. Otherwise the file is returned.
        If the timestamps are different (remote, server sends back an actual
        file), the MD5 of the file is compared against the value stored. If
        they are equal, update the timestamp in the db file to avoid
        repeated MD5s (or server downloads) in the future. If the MD5 is
        different, then reinstall the tarball or rpm, and update the db file
        with the new timestamp and MD5. Presto, we have auto update capability!
        Caveat: I pass along the old MD5 in the URL, but it is currently
        ignored. I do not know if doing the MD5 on the server is a good
        idea, but obviously it is easy to add later. At the moment it
        happens on the node, which means wasted bandwidth when the timestamp
        has changed, but the file has not (probably not something that will
        happen in typical usage).
        Caveat: The timestamp used on remote nodes is the time the tarfile
        is installed (GM time of course). We could arrange to return the
        timestamp of the local file back to the node, but that would mean
        complicating the protocol (or using an http header) and I was not in
        the mood for that. In typical usage, I do not think that people will
        be changing tarfiles and rpms so rapidly that this will make a
        difference, but if it does, we can change it.
      * node_update.in, client side watchdog, and various web pages:
        Deflated node_update, removing all of the older ssh code. We now
        assume that all nodes will auto update on a periodic basis, via the
        watchdog that runs on all client nodes, including plab nodes.
        Changed the permission check to look for new UPDATE permission (used
        to be UPDATEACCOUNT). As before, it requires local_root or better.
        The reason for this is that node_update now implies more than just
        updating the accounts/mounts. The web pages have been changed to
        explain that in addition to mounts/accounts, rpms and tarfiles will
        also be updated. At the moment, this is still tied to a single
        variable (update_accounts) in the nodes table, but as Kirk requested
        at the meeting, it will probably be nice to split these out in the
        Added the ability to node_update a single node in an experiment (in
        addition to all nodes option on the showexp page). This has been
        added to the shownode webpage menu options.
        Changed locking code to use the newer wrapper states, and to move
        the experiment to RUNNING_LOCKED until the update completes. This is
        to prevent mayhem in the rest of the system (which could be dealt
        with, but is not worth the trouble; people have to wait until their
        initiated update is complete, before they can swap out the
        Added "short" mode to shownode routine, equiv to the recently added
        short mode for showexp. I use this on the confirmation page for
        updating a single node, giving the user a couple of pertinent (feel
        good) facts before they comfirm.
    • Mac Newbold's avatar
    • Mac Newbold's avatar
      tbsetup/node_reboot.in · 4bc03e0b
      Mac Newbold authored
  4. 08 Oct, 2003 1 commit
  5. 07 Oct, 2003 9 commits
  6. 06 Oct, 2003 7 commits
    • Robert Ricci's avatar
      Make the regexps for switch version less annoyingly restrictive, and · 475b98a4
      Robert Ricci authored
      take into the account the fact that it works now for 55xx series
    • Leigh B. Stoller's avatar
      Quickie change to experiment with client side caching; Added a new · 9fb9ef8f
      Leigh B. Stoller authored
      "fullconfig" command to cycle through the list of existing commands
      and spit out a section for each. Not all of them of course; just a
      subset that makes sense. I did not that mounts are a bit of a problem
      cause of the USESFS argument. Not sure what to do yet.
    • Robert Ricci's avatar
      Don't make bad lines from switchmac fatal - there are circumstances · 1b9ad406
      Robert Ricci authored
      where they are okay.
    • Leigh B. Stoller's avatar
      * New libtmcc.pm module that encapsulates the tmcc interface. Most of the · 434a472a
      Leigh B. Stoller authored
        code that was in libsetup has moved into this library, and underwent a
        giant cleaning and pumping up. The interface from your typical perl
        script now looks like this:
        use libtmcc;
        if (tmcc(TMCCCMD_STATUS, "optional arguments", \@tmccresults) < 0) {
            warn("*** WARNING: Could not get status from server!\n");
            return -1;
        foreach my $me (@tmccresults) {
      	print "bite $me";
        The arguments and results are optional values. There is a fourth optional
        value that is a hash of config options (basically converted to command
        line switches passed to tmcc). For example, to set the timeout on an
        individual call, pass a fourth argument like:
      	("timeout" => 5)
        There is also a way to set global options so that all subsequent tmcc
        calls are affected:
      	configtmcc("timeout", 5);
        I'll probably clean this up a bit to avoid the direct strings.
        The result list is a list of strings. Since we are trending away from
        using tmcc to transfer large amounts of data, I think this is okay.
      * A new tmcc.pl which does little more than load libtmcc and use it.
        This will become the new tmcc, with the existing C version becoming a
        backend binary for it.
      * All of the perl scripts in tmcd have been changed to use the new
        library. I left the few uses of tmcc in shell scripts alone since they
        were of the simple variety (mostly "state" command).
      * And again, if you have read this far, you will learn why I bothered with
        all this. Well, the existing code was really bad and it was getting out
        of control. Sort of like a squid that was getting harder to control as
        its rotting tenticles slithered into more and more scripts. Anyway ...
        More important, my goal is to use the libtmcc library to add caching.  I
        have not worked out the details yet, but I am envisioning a configuration
        file, perhaps generated initially by tmcd, of all of the config
        values. If the library finds that file, it sucks the info out of the file
        instead of going to tmcd. Eventually, this config file would be generated
        as part of experiment swapping and stored in the DB, but thats a longer
        term project, and perhaps orthogonal (how we fill the cache is not as
        important as adding the ability to use a cache, right?).
        Note that certain operations (like "state" and "ready") are flagged by
        the library to always bypass the "cache".
    • Robert Ricci's avatar
      Lower the time between packets to 30 seconds - looks like some · fd279db8
      Robert Ricci authored
      switches have some really low MAC timeouts!
    • Robert Ricci's avatar
      Fix a bug in the mapping precheck that was allowing nodes with too · 1153a1fb
      Robert Ricci authored
      many links to slip by unnoticed.
    • Leigh B. Stoller's avatar