1. 11 Feb, 2008 1 commit
  2. 28 Nov, 2007 1 commit
  3. 26 Nov, 2007 1 commit
  4. 05 Nov, 2007 3 commits
  5. 02 Nov, 2007 1 commit
  6. 06 Aug, 2007 1 commit
  7. 02 Aug, 2007 1 commit
  8. 19 Jun, 2007 1 commit
    • Leigh B. Stoller's avatar
      Big update to the stats gathering code ... · 495f6803
      Leigh B. Stoller authored
      This change attempts to make the stats gathering code more reliable by
      not relying on the testbed_stats records to reconstruct usage
      statistics.  The main source of errors and total confusion in the
      current stats code is that testbed_stats includes all the errors and
      transitions, from which I have to reconstruct what happened in order
      to determine usage by a project or user.
      
      The new stats code still generates the testbed_stats code, but actual
      usage is recorded as it happens, in the experiment_resources table, as
      swapins, swapouts, and swapmods occur. Its also much faster to compute
      the data for the tables in the web interface, not having to scan a
      zillion testbed_stats records in php.
      
      There is a time consuming update to the records that takes place with
      a lot of tables locked.
      495f6803
  9. 15 May, 2007 1 commit
    • Leigh B. Stoller's avatar
      Checkpoint changes that have been discussed in the last few weeks: · c4f53202
      Leigh B. Stoller authored
      * Records are now "help open" when a run is stopped. When the next run
        is started, a check is made to see if the files
        (/project/$pid/exp/$eid) have changed, and if so a new version of the
        archive is committed before the next run is started.
      
      * Change the way swapmod is handled within an instance. A new option
        on the ShowExp page called Modify Resources. The intent is to allow
        an instance to be modified without having to start and stop runs,
        which tends to clutter things up, according to our user base. So, if
        you are within a run, that run is reset (reused) after the swapmod is
        finished. You can do this as many times as you like. If you are
        between runs (last operation was a stoprun), do the swapmod and then
        "speculatively" start a new run. Subsequent modifies reuse the that
        run again, as above.
      
        I think this is what Kevin was after ... there are some UI issues
        that may need to be resolved, will wait to hear what people have to
        say.
      
      * Revising a record is now supported. Export, change in place, and
        then use the Revise link on the ShowRun page. Currently this has to
        happen from the export directory on ops, but eventually allow an
        upload (to correspond to downloaded exports)
      
      * Check to see if export already exists, and give warning. Added a
        checkbox that allows user to overwrite the export.
      
      * A bunch of minor UI changes to the various template pages.
      c4f53202
  10. 13 Mar, 2007 1 commit
  11. 22 Jan, 2007 1 commit
  12. 10 Jan, 2007 1 commit
  13. 09 Jan, 2007 2 commits
  14. 08 Dec, 2006 2 commits
  15. 04 Dec, 2006 1 commit
  16. 06 Nov, 2006 1 commit
    • Kevin Atkinson's avatar
      libaudit related changes: · e89ee617
      Kevin Atkinson authored
        - Added "LIBAUDIT_FANCY" option to AuditStart.  When this option is
          used libaudit will send a different email than it normally sends,
          and on error call tblog_find_error() to determine the error.
      
        - Also add audit function AddAuditInfo which adds add additional
          information for libaudit to use in SendAuditMail when AUDIT_FANCY
          is set.
      
        - Modify template_swapin, template_instantiate, and template_create
          to use the new audit functionality.
      
        - Suppressing calling tblog_find_error and sending the error email
          when auditing in swapexp and batchexp
      
      tblog changes:
      
        - Shorten the message sent to the user when the error in unknown.
          Remove all parts about lack of free nodes as it no longer really
          applies as tblog now correctly identified those errors and handles
          them separately.  The message is now just "Please look at the log
          below to see what happened."
      
        - Improve algo. used to determine the other error when canceled.
          Will now work by removing all errors related to the cancel request
          and the essentially rerunning tblog_find_error.  If the cause of
          the error is still canceled, repeat and try again until the cause
          is something other than canceled or no errors are left.
      
        - Refactor tblog_find_error, which involves creating new internal
          functions: tblog_determine_single_error, tblog_store_error,
          tblog_dump_error
      
        - Add section on Primary vs Secondary Errors to the inline POD
          documentation.
      
        - Other minor enhancements and bug fixes.
      e89ee617
  17. 20 Oct, 2006 1 commit
    • Mike Hibler's avatar
      Wow, this should make me look important! · afa5e919
      Mike Hibler authored
      Two-day boondoggle to support "/scratch", an optional large, shared filesystem
      for users.  To do this, I needed to find all the instances where /proj is used
      and behave accordingly.  The boondoggle part was the decision to gather up all
      the hardwired instances of shared directory names ("/proj", "/users", etc.)
      so that they are set in a common place (via unexposed configure variables).
      This is a boondoggle because:
      
      1. I didn't change the client-side scripts.  They need a different mechanism
         (e.g., tmcd) to get the info, configure is the wrong way.
      
      2. Even if I had done #1 it is likely--no, certain--that something would
         fail if you tried to rename "/proj" to be "/mike".  These names are just
         too ingrained.
      
      3. We may not even use "/scratch" as it turns out.
      
      Note, I also didn't fix any of the .html documentation.  Anyway, it is done.
      To maintain my illusion in the future you should:
      
      1. Have perl scripts include "use libtestbed" and use the defined PROJROOT(),
         et.al. functions where possible.  If not possible, make sure they run
         through configure and use @PROJROOT_DIR@, etc.
      
      2. Use the configure method for python, C, php and other languages.
      
      3. There are perl (TBValidUserDir) and php (VALIDUSERPATH) functions which
         you should call to determine if an NS, template parameter, tarball or
         other file are in "an acceptable location."  Use these functions where
         possible.  They know about the optional "scratch" filesystem.  Note that
         the perl function is over-engineered to handles cases that don't occur
         in nature.
      afa5e919
  18. 18 Oct, 2006 1 commit
  19. 04 Oct, 2006 1 commit
  20. 26 Sep, 2006 4 commits
  21. 07 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      Some changes to how log files are handled; this too way too long to · c01f7b3e
      Leigh B. Stoller authored
      do!
      
      The original operation was to save up every log file forever in the
      work directory, and copy that out to both the user directory and the
      info directory (long term archive). When I cleaned /proj on ops
      yesterday of all this old cruft, I recoved 17GB of disk space. Yow!
      
      So, the new operation is:
      
      * Only files that end in .log are copied to the user directory. No
        longer copying out .top, .ptop, and a couple of other logs; 99% of
        users never look at these things. We still have them available to us
        though, on boss.
      
      * At the beginning of each swap operation, clean out the work
        directory of all the old log files. These are named a variety of
        ways, so I use some pattern patches to do this.
      
      * Jigger the names a little so that we do not name things in the form
        "$$.log", to avoid copying out different named files to the user
        directory each time; instead link the .log file to the real output
        file so that it gets overwritten each time, while still getting the
        per-swap files for long term storage.
      c01f7b3e
  22. 16 Aug, 2006 1 commit
    • Kevin Atkinson's avatar
      - Added tbreport database schema (added three tables), storage for · 9c5d3308
      Kevin Atkinson authored
        tbreport errors & context.
      
      - Modified fatal() in swapexp, batchexp, and tbprerun, and die_noretry()
        in os_setup to pass hash parameter to tblog functions.
      
      - Added tbreport errror & context information for select errors in
        swapexp, tbswap, assign_wrapper2, snmpit_lib, snmpit, batchexp,
        assign_wrapper, os_setup, parse-ns, & tbprerun.
      
      - Added assign error parser in assign_wrapper2.
      
      - Added parse.tcl error parser in parse-ns.
      
      - Added severity constants for tbreport in libtblog_simple.
      
      - Added tbreport() function & context table mappging for reporting
        discrete error types to libtblog.
      9c5d3308
  23. 26 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · cbf3c5d4
      Kevin Atkinson authored
      swapexp: The previous commit, witch added a message about the recovery
      action when a swap-modify failed to the top of the email, did not
      catch all of the possible cases.  Added the case when the experiment is
      not swapped in.
      
      os_setup: Refactored/rewrote os_setup error summary code.  Distinguish
      the case when nodes fail to properly load the os and when the don't
      boot after loading the os.
      cbf3c5d4
  24. 20 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 5710c340
      Kevin Atkinson authored
      Various tblog changes:
      
      Added message about recovery action when a swap-modify failed to the
      top of the email.
      
      Fine tuned os_setup summary error.  Added (possible partial) list of
      nodes that fail; if a large number fail only show as many that will
      fit on a single line.  Other tweaks.
      
      Flagged assign_wrapper errors of an Invalid OS as user errors.
      5710c340
  25. 05 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 183040de
      Kevin Atkinson authored
      Many changes to tblog code.  Database update needed:
      
      1) Added summary of failed nodes is os_setup.  The cause of the error is now
      classified as "user" if it is only user images that failed and the user
      image failed on every pc of a particular type.  Otherwise I leave the cause
      as "unknown" since it is really hard to tell what the real cause is.
      
      2) Raised the confidence threshold for most errors so that they will appear
      on the top.
      
      3) Added a special error when an experiment is canceled.  The cause is
      "canceled" and testbed-ops won't see these errors.
      
      4) Fixed a bug in assign_wrapper where it will incorrectly report "This
      experiment cannot be instantiated on this testbed..." when really the user
      canceled the swapin.
      
      5) Fixed a bug where os_setup errors where being incorrectly reported as
      assign errors.  This happens when os_setup fails for some reason and
      tbswap tries again, but the second time around there are not enough nodes.
      So the last error is coming from assign even though the true cause of the
      error is due to failed nodes.  The fix for this involved added a new column
      to the log table, "attempt", which will be 1 for the first attempt and then
      incremented for each new attempt.  tblog_find_error will then simply ignore
      any errors with "attempt > 1".
      
      6) Also fixed a potential problem when there is an error during the cleanup
      phase by adding another column "cleanup".  tblog_find_error will
      also ignore any errors with the cleanup bit set.
      183040de
  26. 03 Jul, 2006 1 commit
  27. 15 Jun, 2006 1 commit
  28. 14 Jun, 2006 1 commit
    • Leigh B. Stoller's avatar
      The template "datastore" ... · fe9aa6a4
      Leigh B. Stoller authored
      Each template has a datastore, which is really just a subdirectory that can
      be populated with files, and committed to the subversion archive.  Note,
      the datastore os specific to the template itself. The Template Archive link
      on the Show Template page takes you to the subdirectory, which by
      convention I am calling "datastore".
      
      The directory actually lives in /proj/pid/exp/eid/TGUID-VERS ... but that
      path is printed out for you on the archive page.
      
      Anyway, put stuff in the datastore directory, and then commit the template
      archive so there is a tag associated with it.
      
      When an instance is created, a checkout of the datastore is placed in the
      experiment directory (/proj/pid/eid/exp/template_datastore). The current
      tag (from above) is stored with the instance so that we can later recreate
      the enviroment for the instance, say for rerun.
      
      Tarfiles and rpms in the datastore can be referenced as xxx://foo.rpm (in
      your NS file).  tarfiles_setup transforms those when the instance is
      swapped in, sorta like it does other URLs, only it does not actually fetch
      them, just need to rewrite the paths so they reference datastore.
      
      The program agent gets another environment variable so you can refer to the
      datastore without hardwiring paths ($DATASTORE). Eventually I want to move
      the checkout someplace else, but it was easy to drop it into the experiment
      directory for now.
      fe9aa6a4
  29. 01 Jun, 2006 1 commit
  30. 30 May, 2006 2 commits
    • Leigh B. Stoller's avatar
      Add an export option to the record listing. A new button on the Template · 2cfe4630
      Leigh B. Stoller authored
      Record page lets you export the contents of the archive that corresponds
      to that record, along with an XML file that describes the various DB bits
      for the template and instance.
      
      This is just a first cut so that Mike can start playing around. Subject to
      change, I'm sure.
      
      The archive is dumped to /proj/$pid/exports/$guid/$vers/$exptidx, which
      is basically the last commit of the instance when it was terminated.
      
      The xml file is called export.xml and is placed in the top level directory
      of the above directory. The file is created with XML::Simple, and a typical
      XML file might look like:
      
      <instance>
        <bindings>
          <name>NodeCount</name>
          <description>Number of nodes!</description>
          <value>1</value>
        </bindings>
        <bindings>
          <name>OS</name>
          <description></description>
          <value>RHL90-STD</value>
        </bindings>
        <bindings>
          <name>ScriptArgs</name>
          <description></description>
          <value>-b</value>
        </bindings>
        <eid>NewOne-V2</eid>
        <guid>10149/2</guid>
        <metadata>
          <name>M1</name>
          <guid>10162/1</guid>
          <value>Some metadata</value>
        </metadata>
        <pid>testbed</pid>
        <runs>
          <name>1</name>
          <archive_tag>T20060526-082533-172_endexp</archive_tag>
          <description></description>
          <exptidx>110</exptidx>
          <idx>1</idx>
          <runid>NewOne-V2</runid>
          <start_time>2006-05-26 08:23:02</start_time>
          <stop_time>2006-05-26 08:25:16</stop_time>
        </runs>
        <uid>stoller</uid>
      </instance>
      2cfe4630
    • Leigh B. Stoller's avatar
      a12df9b3
  31. 23 May, 2006 2 commits