1. 19 Oct, 2005 1 commit
  2. 30 Sep, 2005 1 commit
  3. 22 Sep, 2005 1 commit
  4. 13 Jul, 2005 1 commit
  5. 31 May, 2005 1 commit
  6. 27 May, 2005 1 commit
  7. 18 May, 2005 1 commit
  8. 19 Apr, 2005 1 commit
  9. 22 Feb, 2005 1 commit
  10. 05 Nov, 2004 1 commit
  11. 30 Aug, 2004 1 commit
  12. 29 Jul, 2004 1 commit
    • Leigh B. Stoller's avatar
      Two unrelated bug fixes (with some related cleanups and tweaks) · 9f4edbba
      Leigh B. Stoller authored
      * The first involves swapmod. When a swapmod on an active experiment fails,
        tbswap will reswap the experiment back to the original configuration. The
        problem is that it is reswapping it with the *new* virtual state of the
        experiment in the DB. It is not until later when control returns to
        swapexp that the virtual state is restored. This is plainly wrong, and in
        fact was causing the event scheduler grief cause it was starting up,
        reading the the virtual topo, which was different, wrong, and about to be
        blown away.
      
        I reorganized the modify section of swapexp so that virtual state is
        restored only when its a swapmod on a swapped experiment. On an active
        experiment, I moved that code down into tbswap, which will now does all
        of the virtual and physical state retore before it does the reswap back
        to the original experiment. Just for kicks, its also done if tbswap
        decides to swap the experiment cause of a fatal error.
      
        Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot
        deal with !$NoRecover. I know, two knots make a wright for most people.
      
        Renderer: I was annoyed by the fact that we rerun the renderer on a
        failed swapmod. The original reason is that the renderer runs in the
        background and so vis_nodes cannot be saved with the rest of the virtual
        state tables cause the renderer might still be running when the user
        fires off the swapmod. Well, the hell with that. We lock the vis_nodes
        table anyway in the renderer during update, so we are certain to get a
        consistent snapshot. We store the renderer pid in the experiments table,
        so if the renderer was running, just fire off another one; mostly this is
        not going to happen. In addition, tbprerun no longer starts a new
        renderer when doing the swapmod; I start the new renderer later after
        swapmod succeeds. I might end up tweaking this a bit depending on what
        people notice as being different.
      
      * Termination changes to batchexp and swapexp: I've rearranged the
        termination code using an END block so that any uncontrolled exit from
        either batchexp or swapexp will go through the cleanup code, and
        hopefully insert a stats record, as well as not leave the experiment in
        some inbetween state. I've set the max DB retry count to zero in both
        cases, which means infinite retry. I've also added SIGTERM handlers to
        both so that again, we can kill a hung batch/swap and have it clean up
        things more or less. Note that END blocks are not caught when a signal
        causes the program to die; you have to catch it and then die() so that
        the END block is executed.
      
        Eventually, we need to clean up the various libraries so that we do not
        use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure.
        Ditto for event system interface.
      9f4edbba
  13. 28 Jul, 2004 2 commits
    • Leigh B. Stoller's avatar
      Fix merge error in last revision. · 55575967
      Leigh B. Stoller authored
      55575967
    • Leigh B. Stoller's avatar
      Fix rather serious indexing bug that was causing experiment indicies · 95ad01c1
      Leigh B. Stoller authored
      to be reused if the DB is dropped and recreated, since when that
      happens, auto_increment history is lost and it will go back to using
      the latest highest index in the table. Usually not a problem, but
      since we cross index three other tables using the experiment index,
      this causes quite a bit of grief.
      
      So, my solution is to do my own auto_increment using the
      experiment_stats table (locked of course), which we never delete
      entries from without deleting all entries from the other cross
      referenced tables.
      
          DBQueryFatal("select MAX(exptidx) from experiment_stats");
      
      I also added a sanity check to make sure the new index is not
      currently in use in any of the tables. I also cleaned up the
      error path when something goes wrong.
      95ad01c1
  14. 15 Jul, 2004 1 commit
  15. 29 Jun, 2004 1 commit
  16. 21 May, 2004 1 commit
  17. 17 May, 2004 1 commit
  18. 29 Apr, 2004 1 commit
    • Leigh B. Stoller's avatar
      Add prelim support for using linktest. Because of problems, this is · 6cdccbd2
      Leigh B. Stoller authored
      currently available to only people with stud=1 status in the DB.
      
      * www/tbauth.php3: Add a STUDLY() function to check that bit.
      
      * www/linktest.php3: New page to run linktest on the fly. The level
        defaults to the current level in the experiments table, but you can
        override that via the form on the page.
      
      * www/showexp.php3: Add link to aforementioned page. STUDLY() only.
      
      * www/beginexp_form.php3: Add an option (selection) to set the linktest
        level for create/swapin. Defaults to 0 (no linktest). STUDLY() only.
      
      * www/editexp.php3: Add an option to edit the default linktest level
        for an experiment. STUDLY() only.
      
      * tbsetup/batchexp.in and tbsetup/swapexp.in: Add code to optionally run
        the linktest, sending email if it fails (exists with non-zero status).
        Failure does not affect the swapin.
      6cdccbd2
  19. 07 Apr, 2004 1 commit
  20. 15 Mar, 2004 1 commit
  21. 09 Mar, 2004 1 commit
    • Leigh B. Stoller's avatar
      Clean up of the web to batchexp interface: · b6a9b9c2
      Leigh B. Stoller authored
      * Add proper check_slot() calls to all of the user input that is going into
        the DB (already had taint checking), since batchexp is now available for
        interactive use from ops.
      
      * Remove separate DB insertions of noswap/noidleswap reasons from web
        script, and pass on the command line from web to batchexp. Now inserted
        in the backend script so that they can be provided on the command line
        when batchexp is used interactively.
      
      * Change defaults in backend script; experiments now default to swappable
        and idleswap; previously defaulted to not swappable and no idleswap.
      
      * Remove [-s] (swappable) and add [-S <reason>] option. -S sets experiment to
        not swappable, with supplied reason (text string).
      
      * Add [-L <reason>] option. -L sets experiment to no idleswap, with
        supplied reason (text string).
      
      * Add several missing table_regex entries for experiments table.
      b6a9b9c2
  22. 20 Feb, 2004 1 commit
    • Leigh B. Stoller's avatar
      Hmm, looks to me like I got distracted while merging startexp into · 7d9be6de
      Leigh B. Stoller authored
      batchexp, and forgot to finish the changes! The result was a fairly
      broken batch system, which is not hopefully fixed!
      
      Took the opportunity to remove the -x (expires) and -l (priority)
      options which are no longer references anyplace.
      
      Fix up email message so that idle/auto swap times are in hours not
      minutes.
      
      Provide a proper usage() function that describes the morass of
      options (for interactive use from ops).
      7d9be6de
  23. 12 Feb, 2004 1 commit
    • Leigh B. Stoller's avatar
      * Removed startexp, and merged its contents into batchexp. There has been · aef08532
      Leigh B. Stoller authored
        no reason for the separation for a long time, and it made maintence more
        difficult cause of duplication between batchexp and startexp (batch was
        the sole user of startexp). Cleaner solution.
      
      * Check argument processing for batchexp, swapexp, endexp to make sure the
        taint checks are correct. All three of these scripts will now be
        available from ops. I especially watch the filename processing, which was
        pretty loose before and could allow some to grab a file on boss by trying
        to use it as an NS file (scripts all runs as user of course). The web
        interface generates filenames that are hard to guess, so rather then
        wrapping these scripts when invoked from ops, just allow the usual paths
        (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which
        should be hard enough to guess that users will not be able to get
        anything they are not supposed to.
      
      * Add -w (waitmode) options to all three scripts. In waitmode, the backend
        detaches, but the parent remains waiting for the child to finish so it
        can exit with the appropriate status (for scripting). The user can
        interrupt (^C), but it has no effect on the backend; it just kills the
        parent side that is waiting (backend is in a new session ID). Log outout
        still goes to the file (available from web page) and is emailed.
      aef08532
  24. 09 Feb, 2004 1 commit
  25. 02 Dec, 2003 1 commit
  26. 18 Nov, 2003 1 commit
  27. 17 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  28. 05 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Frontend and parser portion of two event system changes: · 091a0b62
      Leigh B. Stoller authored
      * Generate a shared secret key for the event system. This key is
        stored into the DB, and passed to the node via tmcd. It is also
        stashed into a file in the experiment directory (can be accessed
        only by the project/group members). The key is used to attach a
        HMAC (hashed message authentication) to each event, which is checked
        by the receivers to ensure that the event is not bogus. More details
        on this later when I commit the event library/client changes.
      
      * Added "virt_programs" table to store info about each program object
        defined by the user. The intent is to no longer send the command
        string in the event, but to fix it in the DB, and transfer it via
        tmcd. This removes our "remote execution facility" which was always
        a bad idea (we have ssh for that, and that is a lot more secure then
        the event system!).
      
        Note that for the time being we need to continue send the command in
        the event because of old images, but the new images will now ignore
        that part of the event.
      091a0b62
  29. 01 Oct, 2003 1 commit
  30. 30 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Up to now we have had two state variables associated with an experiment, · 4269dad1
      Leigh B. Stoller authored
      plus a lock field. The lock field was a simple "experiment locked, go away"
      slot that is easy to use when you do not care about the actual state that
      an experiment is in, just that it is in "transition" and should not be
      messed with.
      
      The other two state variables are "state" and "batchstate". The former
      (state) is the original variable that Chris added, and was used by the tb*
      scripts to make sure that the experiment was in the state each particular
      script wanted them to be in. But over time (and with the addition of so
      much wrapper goo around them), "state" has leaked out all over the place to
      determine what operations on an experiment are allowed, and if/when it
      should be displayed in various web pages. There are a set of transition
      states in addition to the usual "active", "swapped", etc like "swapping"
      that make testing state a pain in the butt.
      
      I added the other state variable ("batchstate") when I did the batch
      system, obviously! It was intended as a wrapper state to control access to
      the batch queue, and to prevent batch experiments from being messed with
      except when it was really okay (for example, its okay to terminate a
      swapped out batch experiment, but not a swapped in batch experiment since
      that would confuse the batch daemon). There are fewer of these states, plus
      one additional state for "modifying" experiments.
      
      So what I have done is change the system to use "batchstate" for all
      experiments to control entry into the swap system, from the web interface,
      from the command line, and from the batch daemon. The other state variable
      still exists, and will be brutally pushed back under the surface until its
      just a vague memory, used only by the original tb* scripts. This will
      happen over time, and the "batchstate" variable will be renamed once I am
      convinced that this was the right thing to do and that my changes actually
      work as intended.
      
      Only people who have bothered to read this far will know that I also added
      the ability to cancel experiment swapin in progress. For that I am using
      the "canceled" flag (ah, this one was named properly from the start!), and
      I test that at various times in assign_wrapper and tbswap. A minor downside
      right now is that a canceled swapin looks too much like a failed swapin,
      and so tbops gets email about it. I'll fix that at some point (sometime
      after the boss complains).
      
      I also cleaned up various bits of code, replacing direct calls to exec
      with calls to the recently improved SUEXEC interface. This removes
      some cruft from each script that calls an external script.
      
      Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting.
      Also fixed to not run the parser directly! This was very wrong; should
      call nscheck instead. Changed to use "nobody" group instead of group
      flux (made the same change in nscheck).
      
      There is a script in the sql directory called newstates.pl. It needs
      to be run to initialize the batchstate slot of the experiments table
      for all existing experiments.
      4269dad1
  31. 23 Sep, 2003 1 commit
  32. 11 Sep, 2003 1 commit
  33. 17 Jul, 2003 1 commit
    • Mac Newbold's avatar
      Wow. Mountain of changes for the new begin experiment form. · 080feb1b
      Mac Newbold authored
      Lots of changes to the form, both functional and aesthetic. See the
      testbed ops mail logs for a list of all of them, and the rationale.
      
      Corresponding updates to the showexp "edit meta-data" stuff, so that it
      gets all the same error checks as the real form.
      
      Also some backend changes in batchexp to pass through all the new form
      values.
      080feb1b
  34. 10 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      Two sets of minor changes. · 7d0f59e6
      Leigh B. Stoller authored
      * In the parser, make -n (impotent) and -a (anonymous) more
        independent. Used to be that -n required -a, but that makes the
        preparse less useful, since it cannot catch project related errors
        (like bad osids, or node type permissions), and so the user does not
        get that until the email message later. Thats so annoying, even Mike
        whined about it.
      
        Note that impotent mode is sorta misnamed now, since the parse never
        operates on the DB. Rather, impotent mode now skips doing the XML
        output phase (still aptly named updateDB!).
      
      * Add -p (pass) option. I added this for my script that was parsing
        all the old NS files to get renderings. In this case, I do not want
        -n or -a; I want to upload the results into the DB, but the project
        related checks are obviously going to fail since I was doing it
        inside the testbed project. So, -p turns on some of the anon checks,
        and later might be used to turn certain features that are no longer
        supported, since all we really care about is the toplology (some NS
        files failed on old features and syntax).
      
      Upon reflection I think these three options could probably be rolled
      into just two, by cleaning up the current impotent and anonymous
      flags.
      7d0f59e6
  35. 09 Jul, 2003 2 commits
  36. 30 Jun, 2003 1 commit
    • Leigh B. Stoller's avatar
      Make the new parser live on mini. New parser ssh'es over to ops to · 2202fc5a
      Leigh B. Stoller authored
      do the actual parse. The parser now spits out XML instead of DB
      queries, and the wrapper on boss converts that to DB insertions after
      verification. There are some makefile changes as well to install the
      new parser on ops via NFS, since otherwise the parser could
      intolerably out of date on ops!
      2202fc5a
  37. 09 Jun, 2003 1 commit
  38. 31 May, 2003 1 commit