1. 07 Mar, 2007 1 commit
  2. 06 Mar, 2007 1 commit
  3. 21 Feb, 2007 1 commit
  4. 20 Feb, 2007 2 commits
  5. 16 Feb, 2007 1 commit
  6. 14 Feb, 2007 3 commits
  7. 22 Jan, 2007 1 commit
  8. 09 Jan, 2007 1 commit
  9. 05 Jan, 2007 1 commit
    • Kevin Atkinson's avatar
      · 43eac695
      Kevin Atkinson authored
      Location of datastore is not "exp/datastore" not "datastore"
      in exparchive.  Update Template::Instance::CopyDataStore to reflect this.
      43eac695
  10. 08 Dec, 2006 1 commit
    • Leigh Stoller's avatar
      As discussed in meetings and email ... this commit changes what is · b898a8cc
      Leigh Stoller authored
      archived.  Rather then a special "archive" directory in the experiment
      directory, we know archive the entire experiment directory.
      
      This change should be backwards compatable, but let me know if not.
      
      Note that the nsdata directory is gone; the nsfile comes from the
      tbdata, but I know place a copy in nsfile.ns so that the name is well
      known.
      b898a8cc
  11. 09 Nov, 2006 1 commit
  12. 06 Nov, 2006 1 commit
  13. 03 Nov, 2006 1 commit
  14. 20 Oct, 2006 1 commit
    • Leigh Stoller's avatar
      Add compression option to sync option of loghole. When turned on, any file · 4d4a27e1
      Leigh Stoller authored
      greater the 512K is automatically compressed with gzip. Might need to
      make this number bigger; we shall see.
      
      If you run emacs, put this in your .emacs file.
      
      	(load "jka-compr")
      	(jka-compr-install)
      
      and any time you visit a file that ends in one of the standard compression
      extensions, emacs will automatically do the uncompress for you on the data
      in the buffer (not the actual disk file of course). Very convenient.
      
      You can also get your browser to do the same, but I leave that as an
      exercise for the reader.
      4d4a27e1
  15. 18 Oct, 2006 1 commit
  16. 13 Oct, 2006 1 commit
  17. 12 Oct, 2006 1 commit
    • Leigh Stoller's avatar
      By popular demand, give user a choice of where to get the next set of · bb996961
      Leigh Stoller authored
      (initial) parameters for a new run. Three choices right now; from the
      template itself, from the instance, or from the previous run. On the
      web interface this is presented as three buttons. On ops, it is the
      the -y option, which takes one of template,instance,lastrun as its
      argument (you can of course combine the -y option with an XML file to
      override specific params).
      
      At present, there is no default. Lets give it a chance to sink in
      before I pick something that will annoy 50% of the people 75% of the
      time.
      bb996961
  18. 09 Oct, 2006 2 commits
  19. 08 Oct, 2006 3 commits
  20. 05 Oct, 2006 1 commit
    • Leigh Stoller's avatar
      More work on "recording" template events. · e9607a77
      Leigh Stoller authored
      * New version of template_record just for ops, since so much is
        different about ops, not bothering to maintain a single version.
      
      * Various fixes to how the recorded events are stored and reconstituted.
        The big fix is to wrap them in a sequence to that they get fired
        properly (waiting for completion of previous event in recording).
      
      * New buttons to Pause and Continue event time, which is used when
        adding recorded events. This allows users to pause time while they
        "think" so when an event is recorded, the thinking time is not actually
        in the timeline. Eventually hope to figure this out automatically, but
        that will take some real, uh, thinking.
      
      * Add a new event editor (linked off the template page) that allows
        you to delete and change the recordings. Note that you can only edit
        the events at the template level; you cannot edit the events of an
        instance (swapped in experiment), and you can only edit the recorded
        events, not any other events. Not sure its useful to be able to do
        either of these yet, but probably not too hard to add at some point.
      e9607a77
  21. 03 Oct, 2006 1 commit
    • Leigh Stoller's avatar
      Two small changes. · 89c9208b
      Leigh Stoller authored
      * Copy template datastore to instance "datastore" directory, instead of
        "template_datastore" ... avoid mass confusion.
      
      * Change xxx:// mappping from template datastore to the instance datastore.
      89c9208b
  22. 29 Sep, 2006 2 commits
  23. 27 Sep, 2006 1 commit
  24. 26 Sep, 2006 3 commits
    • Leigh Stoller's avatar
      4b6d1df5
    • Leigh Stoller's avatar
      Fix minor typo. · b98aa4ec
      Leigh Stoller authored
      b98aa4ec
    • Leigh Stoller's avatar
      * A bit more support for swapmod from Start Run. Mostly bookkeeping · a8631011
      Leigh Stoller authored
        info so we have a record of it.
      
      * First attempt at dealing with nodes that do not respond to the
        synchronous events that are sent from start and stop run. Rather
        then failing, attempt to figure out which nodes are actually dead,
        and save some state in the DB associated with the run. The current
        method for figuring out which nodes are dead is the node_status
        table, since the event scheduler is the only thing that knows what
        nodes did not respond. Will probably revisit this very soon.
      
      * Bug fixes of course.
      
      * Start implementing a Run object so replace some of the code in the
        Instance object.
      a8631011
  25. 20 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      By popular demand, you can now force a swap modify to be done when · b9161642
      Leigh Stoller authored
      doing a Start Run. On the web page, there is a new checkbox, and
      on ops, template_startrun takes a new -m option.
      
      Caveat: You cannot specify a new NS file, yet. The original file is
      reparsed, and the idea is that a change in the template parameters
      will result in a change to the topology. I will add the ability to
      specify a new NS file in the next revision of this change.
      
      If you really really want to change the NS file, go to
      /proj/$pid/exp/$eid/archive/nsdata and edit nsfile.ns ...
      
      In addtion, DATASTORE is now defined while parsing the NS file. This
      turned to be quite the headache!
      b9161642
  26. 12 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      This started out as a simple little hack to add a StopRun "ns" event, but · cbdc4178
      Leigh Stoller authored
      it got more complicated as it progressed.
      
      The bulk of the change was changing template_exprun so that it can take a
      pid/eid as an alternative to eid/guid. This is a big convenience since its
      easy to find the template from a running experiment, and it makes it
      possible to invoke from the event scheduler, which has never heard of a
      template before (and its not something I wanted to teach it about).  Its
      also easier on users.
      
      Anyway, back to the stoprun event. You can now do this:
      
      	$ns at 100 "$ns stoprun"
      or
      	tevc -e pid/eid now ns stoprun
      
      You can add the -w option to wait for the completion event that is sent,
      but this brings me to the glaring problems with this whole thing.
      
      * First, the scheduler has to fire off the stoprun in the background,
        since if it waits, we get deadlock. Why? Cause the implementation of
        stoprun uses the event system (SNAPSHOT event, other things), and if
        the scheduler is sitting and waiting, nothing happens.
      
        Okay, the solution to this was to generate a COMPLETION event from
        template_exprun once the stop operation is complete. This brings me
        to the second problem ...
      
      * Worse, is that the "ns" events that are sent to implement stoprun (like
        snapshot) send their own completion events, and that confuses anyone
        waiting on the original stoprun event (it returns early).
      
        So what to do about this? There is a "token" field in the completion
        event structure, which I presume is to allow you to match things up.  But
        there is no way to set this token using tevc (and then wait for it), and
        besides, the event scheduler makes them up anyway and sticks them into
        the event. So, the seed of a fix are already germinating in my mind, but
        I wanted to get this commit in so that Mike would have fun reading this
        commit log.
      cbdc4178
  27. 10 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      The bulk of this commit adds the ability to run the program agent on ops · e8bb6bca
      Leigh Stoller authored
      so that users can schedule program events to run there. For example:
      
      	set myprog [new Program $ns]
      	$myprog set node "ops"
      	$myprog set command "/usr/bin/env >& /tmp/foo"
      
      	$ns at 10 "$myprog start"
      or
      	tevc -e pid/eid now myprog start
      
      Since the program agent cannot talk to tmcd from ops, there are new
      routines to create the config files that the program agent uses, in
      the expertment tbdata directory.
      
      I also rewrote the eventsys.proxy script that starts the event
      scheduler on ops; I rolled the startup of the program agent into this
      script, via new -a option which is passed over from boss when an ops
      program agent is detected in the virt topology. This keep the number
      of new processes on ops to a small number.
      
      Also part of the above rewrite is that we now catch when event
      scheduler (or the program agent) exits abnormally, sending email to
      tbops and the swapper of the experiment. We have been seeing abnormal
      exits of the scheduler and it would good to detect and see if we can
      figure out what is going wrong.
      
      Other small bug fixes in experiment run.
      e8bb6bca
  28. 05 Sep, 2006 1 commit
    • Leigh Stoller's avatar
      A bunch of template changes resulting from meetings last week. · 087dbfff
      Leigh Stoller authored
      * Add XMLRPC interface for template swapin,stoprun,startrun,swapout and
        add the appropriate wrappers to the script_wrapper on ops.
      
      * Allow parameter descriptions in NS files. This is probably not in its
        final form since its a bit confusing as to what has priority; something
        in the NS file or a metadata item. Anyway, you can do this in your NS
        file:
      
      	$ns define-template-parameter GUID "0/0" "The GUID to be analyzed"
      
        The rules are currently that the NS file description has priority and
        is copied to child templates, unless the user has modified a description
        via the web interface, in which case the NS file description is ignored.
        I know, sounds awful, but for the most part people are going to use the
        NS file anyway.
      
      * Add "clear" option when starting a new experiment run; the per
        experiment DB at the logholes are cleared. Note that this is *not* the
        default behaviour; you have to either check the checkbox on the web form
        or use the -c option to the script wrapper, or clear=yes if talking
        directly to the XMLRPC server.
      
      * Fix up how email is generated for template_swapin and template_create,
        so that Kevin can debug tblog/tbreport stuff, but also so that we maintain
        mail logs as before. I have made some improvements to libaudit so as to
        centralize the mail goo, and avoid duplicating all that stuff.
      
      * Minor fixes to the program agent so that the new environment strings are
        sent before the program agent exits and reloads them!
      
      * Other minor little things.
      087dbfff
  29. 14 Aug, 2006 1 commit
    • Leigh Stoller's avatar
      Checkpoint my dynamic event stuff, crude as it is. The idea for this first · 9d021a07
      Leigh Stoller authored
      draft is that the user will at the end of an experiment run, log into one
      of his nodes and perform some analysis which is intended to be repeated at
      the end of the next run, and in future instantiations of the template.
      
      A new table called experiment_template_events holds the dynamic events for
      the template. Right now I am supporting just program events, but it will be
      easy to support arbitrary events later. As an absurd example:
      
      	node6> /usr/local/bin/template_analyze ~/data_analyze arg arg ...
      
      The user is currently responsible for making sure the output goes into a
      file in the archive. I plan to make the template_analyze wrapper handle
      that automatically later, but for now what you really want is to invoke a
      script that encapsulates that, redirecting output to $ARCHIVE (this
      variable is installed in the environment template_analyze.
      
      The wrapper script will save the current time, and then run the program.
      If the program terminates with a zero exit status, it will ssh over to ops
      and invoke an xmlrpc routine to tell boss to add a program event to both
      the eventlist for the current instance, and to the template_eventlist for
      future instances. The time of the event is the relative start time that was
      saved above (remember, each experiment run replays the event stream from
      time zero).
      
      For the future, we want to allow this to be done on ops as well, but
      that will take more infrastructure, to run "program agents" on ops.
      
      It would be nice to install the ssl xmlrpc client side on our images so
      that we do not have to ssh to ops to invoke the client.
      9d021a07
  30. 10 Aug, 2006 1 commit
    • Leigh Stoller's avatar
      Okay, now we can view graphs from the historical data (template record). · 0c1b1a23
      Leigh Stoller authored
      A couple of things to note:
      
      * When requesting a graph, we have to have a checkout of the archive
        (the DB dump file) so that we can create a temporary DB with the data.
        This is done on demand, and the DB is left in place since its a
        fairly time consuming operation to do the checkout and the dbload.
        I do not delete the DBs though; we will need to age them out as needed.
      
      * Even so, when returning to a page we end up getting the graphs
        again, and that still takes more time then I like to wait. Perhaps
        add a refresh button so that the user has to force a redraw. Might
        need to add a time/date stamp to the graph.
      0c1b1a23
  31. 08 Aug, 2006 1 commit