1. 29 Jul, 2004 11 commits
    • Leigh B. Stoller's avatar
    • Leigh B. Stoller's avatar
      Rework TBGetSiteVar() slightly. Add optional second parameter $rptr to · 03403a55
      Leigh B. Stoller authored
      store the result in. When called this new way, the value goes into
      $rptr, and exit status is returned to caller instead. In addition,
      when called this way, all errors are non-fatal; it is up to the caller
      to decide what to do.
      03403a55
    • Leigh B. Stoller's avatar
      Two unrelated bug fixes (with some related cleanups and tweaks) · 9f4edbba
      Leigh B. Stoller authored
      * The first involves swapmod. When a swapmod on an active experiment fails,
        tbswap will reswap the experiment back to the original configuration. The
        problem is that it is reswapping it with the *new* virtual state of the
        experiment in the DB. It is not until later when control returns to
        swapexp that the virtual state is restored. This is plainly wrong, and in
        fact was causing the event scheduler grief cause it was starting up,
        reading the the virtual topo, which was different, wrong, and about to be
        blown away.
      
        I reorganized the modify section of swapexp so that virtual state is
        restored only when its a swapmod on a swapped experiment. On an active
        experiment, I moved that code down into tbswap, which will now does all
        of the virtual and physical state retore before it does the reswap back
        to the original experiment. Just for kicks, its also done if tbswap
        decides to swap the experiment cause of a fatal error.
      
        Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot
        deal with !$NoRecover. I know, two knots make a wright for most people.
      
        Renderer: I was annoyed by the fact that we rerun the renderer on a
        failed swapmod. The original reason is that the renderer runs in the
        background and so vis_nodes cannot be saved with the rest of the virtual
        state tables cause the renderer might still be running when the user
        fires off the swapmod. Well, the hell with that. We lock the vis_nodes
        table anyway in the renderer during update, so we are certain to get a
        consistent snapshot. We store the renderer pid in the experiments table,
        so if the renderer was running, just fire off another one; mostly this is
        not going to happen. In addition, tbprerun no longer starts a new
        renderer when doing the swapmod; I start the new renderer later after
        swapmod succeeds. I might end up tweaking this a bit depending on what
        people notice as being different.
      
      * Termination changes to batchexp and swapexp: I've rearranged the
        termination code using an END block so that any uncontrolled exit from
        either batchexp or swapexp will go through the cleanup code, and
        hopefully insert a stats record, as well as not leave the experiment in
        some inbetween state. I've set the max DB retry count to zero in both
        cases, which means infinite retry. I've also added SIGTERM handlers to
        both so that again, we can kill a hung batch/swap and have it clean up
        things more or less. Note that END blocks are not caught when a signal
        causes the program to die; you have to catch it and then die() so that
        the END block is executed.
      
        Eventually, we need to clean up the various libraries so that we do not
        use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure.
        Ditto for event system interface.
      9f4edbba
    • Leigh B. Stoller's avatar
      Set $libdb::DBQUERY_MAXTRIES = 0, which means inifinite retry. · 719a65c4
      Leigh B. Stoller authored
      That will show the devil who means business. Right on.
      719a65c4
    • Mike Hibler's avatar
    • Mike Hibler's avatar
      01639240
    • Mike Hibler's avatar
      Make sure we don't produce vnode IP addresses with leading zeros in the octets. · d8e85f02
      Mike Hibler authored
      This is apparently a no-no and was happening at DETER.
      d8e85f02
    • Mike Hibler's avatar
    • Mike Hibler's avatar
      Take imagezip out of the client build/install. · ab8f2d4c
      Mike Hibler authored
      It isn't needed on clients and it requires the linuxthreads package
      which is not otherwise needed.
      ab8f2d4c
    • Leigh B. Stoller's avatar
    • Leigh B. Stoller's avatar
  2. 28 Jul, 2004 7 commits
  3. 27 Jul, 2004 5 commits
    • Kirk Webb's avatar
      · 4c68e20f
      Kirk Webb authored
      Wrote new abstract and outline for WORLDS paper based on today's paper
      meeting.
      4c68e20f
    • Timothy Stack's avatar
      Many tweaks to the feedback stuff and more comments. Update · ead90a69
      Timothy Stack authored
      webfeedback to talk to the newer version of canaryd.  Add feedback
      "estimate" stuff so that if we have no data (because of an overloaded
      node) to work with, we can make some sort of "reasonable" guesstimate
      on every iteration.
      ead90a69
    • Timothy Stack's avatar
      Updated canaryd, ended up starting fresh and pulling things in, rather · 62b5e3a8
      Timothy Stack authored
      than jamming more stuff into the old one.  Most of the code came from
      the previous version of canaryd, the cpu broker (process accounting),
      and the janosvm (network interface accounting).  Its missing some
      features of the old one, but those can be incorporated without too
      much trouble.
      
      Changes:
      
        Designed to permanently run on the pnodes:  it waits for START events
          before it begins recording.  However, I haven't done the work
          necessary to have it always startup on the pnodes.
      
        No more exec'ing: process stuff is taken from "/proc", and network
          interface stats are pulled from getifaddrs(3).
      
        Fixed some minor bugs: A typo caused the real-time priority to not
          be set, use setitimer instead of sleep to get more accurate
          spacing between samples.
      62b5e3a8
    • Leigh B. Stoller's avatar
      9a237b4a
    • Leigh B. Stoller's avatar
      Fix up imageid edit page so that you can reset the frisbee_pid · 0c1b9535
      Leigh B. Stoller authored
      at the same time you clear the load address. admin people only.
      0c1b9535
  4. 26 Jul, 2004 7 commits
  5. 23 Jul, 2004 3 commits
  6. 22 Jul, 2004 1 commit
  7. 21 Jul, 2004 3 commits
  8. 20 Jul, 2004 3 commits