1. 20 Oct, 2006 1 commit
    • Mike Hibler's avatar
      Wow, this should make me look important! · afa5e919
      Mike Hibler authored
      Two-day boondoggle to support "/scratch", an optional large, shared filesystem
      for users.  To do this, I needed to find all the instances where /proj is used
      and behave accordingly.  The boondoggle part was the decision to gather up all
      the hardwired instances of shared directory names ("/proj", "/users", etc.)
      so that they are set in a common place (via unexposed configure variables).
      This is a boondoggle because:
      
      1. I didn't change the client-side scripts.  They need a different mechanism
         (e.g., tmcd) to get the info, configure is the wrong way.
      
      2. Even if I had done #1 it is likely--no, certain--that something would
         fail if you tried to rename "/proj" to be "/mike".  These names are just
         too ingrained.
      
      3. We may not even use "/scratch" as it turns out.
      
      Note, I also didn't fix any of the .html documentation.  Anyway, it is done.
      To maintain my illusion in the future you should:
      
      1. Have perl scripts include "use libtestbed" and use the defined PROJROOT(),
         et.al. functions where possible.  If not possible, make sure they run
         through configure and use @PROJROOT_DIR@, etc.
      
      2. Use the configure method for python, C, php and other languages.
      
      3. There are perl (TBValidUserDir) and php (VALIDUSERPATH) functions which
         you should call to determine if an NS, template parameter, tarball or
         other file are in "an acceptable location."  Use these functions where
         possible.  They know about the optional "scratch" filesystem.  Note that
         the perl function is over-engineered to handles cases that don't occur
         in nature.
      afa5e919
  2. 07 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      Some changes to how log files are handled; this too way too long to · c01f7b3e
      Leigh Stoller authored
      do!
      
      The original operation was to save up every log file forever in the
      work directory, and copy that out to both the user directory and the
      info directory (long term archive). When I cleaned /proj on ops
      yesterday of all this old cruft, I recoved 17GB of disk space. Yow!
      
      So, the new operation is:
      
      * Only files that end in .log are copied to the user directory. No
        longer copying out .top, .ptop, and a couple of other logs; 99% of
        users never look at these things. We still have them available to us
        though, on boss.
      
      * At the beginning of each swap operation, clean out the work
        directory of all the old log files. These are named a variety of
        ways, so I use some pattern patches to do this.
      
      * Jigger the names a little so that we do not name things in the form
        "$$.log", to avoid copying out different named files to the user
        directory each time; instead link the .log file to the real output
        file so that it gets overwritten each time, while still getting the
        per-swap files for long term storage.
      c01f7b3e
  3. 17 Jul, 2006 1 commit
  4. 12 May, 2006 1 commit
    • Leigh Stoller's avatar
      Redo the entire template library. I've been meaning to use perl · 78503406
      Leigh Stoller authored
      "object" and this was a good opportunity to see if they are useful and
      easy enough to use. Yep they are; the code is much cleaner with many
      fewer utility functions to get at stuff. I recommend this approach
      from now on.
      
      The problem is the php side, which ends up duplicating some stuff, but
      in the old style. This is not so bad for the template code since I
      have made it a point not to do anything but display functions in php;
      all modifications are handled in the backend.
      78503406
  5. 05 May, 2006 1 commit
  6. 08 Jan, 2006 1 commit
  7. 12 Dec, 2005 1 commit
    • Leigh Stoller's avatar
      Check for batchstate!='locked' when considering experiments to act on. · b31c0e8d
      Leigh Stoller authored
      A failed swapout/endexp will leave the experiment in its original
      state (which is what we want), but also leave the batchstate locked,
      so use that to make sure we do not try to continually swapout/endexp
      an experiment that failed its swapout/endexp.
      
      The other possibility to move the experiment out of the batch queue
      on such a failure. Might do that later.
      
      Note that we should probably put a check in db/audit for experiments
      with the batchstate='locked'.
      b31c0e8d
  8. 12 Sep, 2005 1 commit
  9. 31 May, 2005 1 commit
  10. 14 Feb, 2005 1 commit
  11. 29 Jul, 2004 1 commit
  12. 22 Jul, 2004 1 commit
  13. 18 Jun, 2004 1 commit
  14. 26 Apr, 2004 1 commit
    • Leigh Stoller's avatar
      Changes to exit status stuff to reflect recent changes by Rob to how · 1c4a613c
      Leigh Stoller authored
      assign exits (exit codes).
      
      * in assign_wrapper, no longer return any status from assign to the
        caller. This was pointless. Instead, return 0 on success, 1 on
        controlled error, and -1 on uncontrolled error (die() called
        someplace). Add in CANRECOVER bit whenever the wrapper exits, even
        if uncontrolled, by putting in an END block to catch the die. This
        should prevent certain cases where a swapmod error would be flagged
        as not recoverable.
      
      * Remove most of the assign output processing since we no longer
        return its codes. Still print a portion of it to the log though.
      
      * Change call to fatal() in assign_wrapper; do not pass an exitcode
        since in every case it was the same damn thing!
      
      * Change tbswap to no longer carry assign_wrapper exit code to its
        exit.
      
      * Change the batch daemon to treat all errors as continuable (keep
        batch queued) unless exit code is -1. We will need to revisit this a
        bit perhaps, when Rob adds precheck code.
      1c4a613c
  15. 20 Feb, 2004 1 commit
  16. 11 Dec, 2003 1 commit
  17. 17 Nov, 2003 1 commit
    • Leigh Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  18. 02 Oct, 2003 1 commit
  19. 30 Sep, 2003 1 commit
    • Leigh Stoller's avatar
      Up to now we have had two state variables associated with an experiment, · 4269dad1
      Leigh Stoller authored
      plus a lock field. The lock field was a simple "experiment locked, go away"
      slot that is easy to use when you do not care about the actual state that
      an experiment is in, just that it is in "transition" and should not be
      messed with.
      
      The other two state variables are "state" and "batchstate". The former
      (state) is the original variable that Chris added, and was used by the tb*
      scripts to make sure that the experiment was in the state each particular
      script wanted them to be in. But over time (and with the addition of so
      much wrapper goo around them), "state" has leaked out all over the place to
      determine what operations on an experiment are allowed, and if/when it
      should be displayed in various web pages. There are a set of transition
      states in addition to the usual "active", "swapped", etc like "swapping"
      that make testing state a pain in the butt.
      
      I added the other state variable ("batchstate") when I did the batch
      system, obviously! It was intended as a wrapper state to control access to
      the batch queue, and to prevent batch experiments from being messed with
      except when it was really okay (for example, its okay to terminate a
      swapped out batch experiment, but not a swapped in batch experiment since
      that would confuse the batch daemon). There are fewer of these states, plus
      one additional state for "modifying" experiments.
      
      So what I have done is change the system to use "batchstate" for all
      experiments to control entry into the swap system, from the web interface,
      from the command line, and from the batch daemon. The other state variable
      still exists, and will be brutally pushed back under the surface until its
      just a vague memory, used only by the original tb* scripts. This will
      happen over time, and the "batchstate" variable will be renamed once I am
      convinced that this was the right thing to do and that my changes actually
      work as intended.
      
      Only people who have bothered to read this far will know that I also added
      the ability to cancel experiment swapin in progress. For that I am using
      the "canceled" flag (ah, this one was named properly from the start!), and
      I test that at various times in assign_wrapper and tbswap. A minor downside
      right now is that a canceled swapin looks too much like a failed swapin,
      and so tbops gets email about it. I'll fix that at some point (sometime
      after the boss complains).
      
      I also cleaned up various bits of code, replacing direct calls to exec
      with calls to the recently improved SUEXEC interface. This removes
      some cruft from each script that calls an external script.
      
      Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting.
      Also fixed to not run the parser directly! This was very wrong; should
      call nscheck instead. Changed to use "nobody" group instead of group
      flux (made the same change in nscheck).
      
      There is a script in the sql directory called newstates.pl. It needs
      to be run to initialize the batchstate slot of the experiments table
      for all existing experiments.
      4269dad1
  20. 29 Sep, 2003 1 commit
  21. 26 Sep, 2003 1 commit
  22. 29 Jul, 2003 3 commits
  23. 22 May, 2003 1 commit
    • Leigh Stoller's avatar
      Reorg the batch system slightly as per Eric's request that batch mode · da97ba35
      Leigh Stoller authored
      experiments look more like regular experiments. Batch mode experiments
      can now be preloaded and swapped. When preloaded, they go into a
      "Pause" state. Swapping a batch mode experiment in puts them into the
      "posted" state so the batch daemon will see them. Swapping out a
      batchmode experiment does the expected; it puts them back into the
      Pause state. Terminating a batch mode experiment does the expected;
      its gone. When a batch mode experiment finishes normally, it goes back
      into the pause state, which allows batches to be reinjected as many
      times as Eric likes.
      da97ba35
  24. 30 Apr, 2003 2 commits
    • Leigh Stoller's avatar
      Some batch mode changes. In the early days we did not have such fancy · 0197f41d
      Leigh Stoller authored
      tb tools! I've changed the batch system to "preload" the experiment in
      foreground mode (results of parse spit back to user directly). The
      batch daemon now uses swapexp instead of startexp. Upon failure, the
      experiment goes back to the "swapped" state; previously its virt state
      was blasted, and rentered again next try. This is nice cause you can
      actually look at the batch experiment (vis, virt tables, etc) while it
      is posted and not running.
      
      Not sure if all the Ts are crossed. Will find out ...
      0197f41d
    • Leigh Stoller's avatar
      Add batch/retry_wait sitevar, defaulted to 900 seconds between · ba8103b0
      Leigh Stoller authored
      retries. Change batch daemon to check that variable each loop.
      ba8103b0
  25. 16 Apr, 2003 1 commit
  26. 30 Jan, 2003 1 commit
  27. 16 Sep, 2002 1 commit
  28. 11 Jul, 2002 1 commit
  29. 07 Jul, 2002 1 commit
  30. 29 Apr, 2002 1 commit
  31. 12 Feb, 2002 1 commit
  32. 12 Nov, 2001 1 commit
  33. 08 Nov, 2001 1 commit
  34. 25 Oct, 2001 1 commit
  35. 17 Oct, 2001 1 commit
    • Leigh Stoller's avatar
      Rework of the batch experiment code. Unified it with the immediate · 4d420b21
      Leigh Stoller authored
      experiment code. No longer uses another table. Rather, the experiment
      record contains a couple of extra fields for the batch system. Also
      combined some of the backend code (no longer a killbatch script).
      Also added scriptable experiments; the batchexp program in the bin
      directory can start an experiment from the command line, and in fact
      is used from the web page for both batch experiments and immediate
      experiments (-i option). All of the DB code that was in the web
      interfaces was moved to batchexp.
      4d420b21
  36. 16 Oct, 2001 1 commit
  37. 26 Sep, 2001 1 commit