1. 20 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      By popular demand, you can now force a swap modify to be done when · b9161642
      Leigh B. Stoller authored
      doing a Start Run. On the web page, there is a new checkbox, and
      on ops, template_startrun takes a new -m option.
      
      Caveat: You cannot specify a new NS file, yet. The original file is
      reparsed, and the idea is that a change in the template parameters
      will result in a change to the topology. I will add the ability to
      specify a new NS file in the next revision of this change.
      
      If you really really want to change the NS file, go to
      /proj/$pid/exp/$eid/archive/nsdata and edit nsfile.ns ...
      
      In addtion, DATASTORE is now defined while parsing the NS file. This
      turned to be quite the headache!
      b9161642
  2. 19 Sep, 2006 5 commits
  3. 18 Sep, 2006 1 commit
    • Mike Hibler's avatar
      Exclude "ipv4" protocol LANs from the "__all_lans" group. Why? Read on... · bd0fbe25
      Mike Hibler authored
      Rob added code to let you put plab nodes in a LAN (with protocol "ipv4")
      so that the LAN could be traced.
      
      Meanwhile, Leigh has been adding completion events for various event types
      and creating various combo event groups for addressing all elements of a LAN,
      all LANs, etc.
      
      For pelab we have a link "reset" event which is sent to the "__all_lans"
      event group to reset the shaping of all links/lans.  Unfortunately, plab
      nodes don't run a delay agent, so link events sent there fall on deaf ears.
      
      Ergo, the event scheduler waits for completions from all agents it send
      the "reset" to, but the plab nodes will never respond since there are no
      link agents.
      bd0fbe25
  4. 16 Sep, 2006 1 commit
  5. 14 Sep, 2006 3 commits
    • Leigh B. Stoller's avatar
      Rework the event handling for the program agent so that both the · aead4580
      Leigh B. Stoller authored
      reload and halt events send proper completion events. This is required
      for stoprun and startrun to work correctly. On stoprun the logs are
      not collected until the programs have stopped, and on startrun we do
      not want to proceed until all the agents have reloaded their
      environments.
      aead4580
    • Mike Hibler's avatar
      fix a minor typo · 2af56958
      Mike Hibler authored
      2af56958
    • Leigh B. Stoller's avatar
      Add completion event to pcapper SNAPSHOT so that the caller (say, loghole) · 53c3944c
      Leigh B. Stoller authored
      knows when the logfiles are actually rolled.
      
      Event groups complicated things a bit. To make this work properly, we
      no longer subscribe to the link-tracemon event, but instead use a
      real event group, created by assign wrapper for all of the linktrace
      agents. So, you can know do things like this:
      
      	tevc -w -e testbed/TT now link0_tracemon snapshot
      or
      	tevc -w -e testbed/TT now __all_tracemon snapshot
      
      where __all_tracemon is a group of all tracemon agents for all links and
      lans. I plan to change loghole to use this.
      53c3944c
  6. 13 Sep, 2006 4 commits
  7. 12 Sep, 2006 2 commits
    • Kirk Webb's avatar
      · 52dcfd48
      Kirk Webb authored
      Added secondary logging for node setup/teardown success/failure.  Also log
      node pool membership changes in this log.
      52dcfd48
    • Leigh B. Stoller's avatar
      This started out as a simple little hack to add a StopRun "ns" event, but · cbdc4178
      Leigh B. Stoller authored
      it got more complicated as it progressed.
      
      The bulk of the change was changing template_exprun so that it can take a
      pid/eid as an alternative to eid/guid. This is a big convenience since its
      easy to find the template from a running experiment, and it makes it
      possible to invoke from the event scheduler, which has never heard of a
      template before (and its not something I wanted to teach it about).  Its
      also easier on users.
      
      Anyway, back to the stoprun event. You can now do this:
      
      	$ns at 100 "$ns stoprun"
      or
      	tevc -e pid/eid now ns stoprun
      
      You can add the -w option to wait for the completion event that is sent,
      but this brings me to the glaring problems with this whole thing.
      
      * First, the scheduler has to fire off the stoprun in the background,
        since if it waits, we get deadlock. Why? Cause the implementation of
        stoprun uses the event system (SNAPSHOT event, other things), and if
        the scheduler is sitting and waiting, nothing happens.
      
        Okay, the solution to this was to generate a COMPLETION event from
        template_exprun once the stop operation is complete. This brings me
        to the second problem ...
      
      * Worse, is that the "ns" events that are sent to implement stoprun (like
        snapshot) send their own completion events, and that confuses anyone
        waiting on the original stoprun event (it returns early).
      
        So what to do about this? There is a "token" field in the completion
        event structure, which I presume is to allow you to match things up.  But
        there is no way to set this token using tevc (and then wait for it), and
        besides, the event scheduler makes them up anyway and sticks them into
        the event. So, the seed of a fix are already germinating in my mind, but
        I wanted to get this commit in so that Mike would have fun reading this
        commit log.
      cbdc4178
  8. 11 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · aa446875
      Kirk Webb authored
      plab logging enhancements.
      
      timing information for various RPCs is now logged to
      /usr/testbed/log/plabtiming.log.  This info will be useful for extracting
      trends for the various plab nodes, and in calculating reliability and
      timing metrics.  These could be used, for e.g., to pick nodes that tend to
      come up more quickly.
      
      This update also squelches much of the python backtrace noise when plab nodes
      fail to setup correctly (can be turned on with debug flag).  Instead, failures
      are summarized on a single line.
      
      Oh, and pay no attention to the aspect behind the curtain!  Yes, you may
      groan and moan if you wish - I'm using aspects to help do the logging.  I
      find this to be a really slick way of wrapping several functions!
      aa446875
  9. 10 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      The bulk of this commit adds the ability to run the program agent on ops · e8bb6bca
      Leigh B. Stoller authored
      so that users can schedule program events to run there. For example:
      
      	set myprog [new Program $ns]
      	$myprog set node "ops"
      	$myprog set command "/usr/bin/env >& /tmp/foo"
      
      	$ns at 10 "$myprog start"
      or
      	tevc -e pid/eid now myprog start
      
      Since the program agent cannot talk to tmcd from ops, there are new
      routines to create the config files that the program agent uses, in
      the expertment tbdata directory.
      
      I also rewrote the eventsys.proxy script that starts the event
      scheduler on ops; I rolled the startup of the program agent into this
      script, via new -a option which is passed over from boss when an ops
      program agent is detected in the virt topology. This keep the number
      of new processes on ops to a small number.
      
      Also part of the above rewrite is that we now catch when event
      scheduler (or the program agent) exits abnormally, sending email to
      tbops and the swapper of the experiment. We have been seeing abnormal
      exits of the scheduler and it would good to detect and see if we can
      figure out what is going wrong.
      
      Other small bug fixes in experiment run.
      e8bb6bca
  10. 08 Sep, 2006 2 commits
    • Leigh B. Stoller's avatar
      Two small changes: · 77d2e17c
      Leigh B. Stoller authored
      * Handle cancelation of instantiation.
      
      * Call out to template_exprun instead of inlining most of what it does.
      77d2e17c
    • Kirk Webb's avatar
      · 3a3c95fb
      Kirk Webb authored
      Parallelize the setup of plab vnodes alongside the loading of local
      physical nodes.  We fork vnode_setup to operate on the plab vnodes just
      before firing off local reload/reboot/reconfig operations.  The status
      of the plab vnode setup setup is checked just before firing off vnode_setup
      for any local vnodes.  The ISUP wait for plab vnodes continues to fall
      within the same stage as wating for local vnodes.  New arguments have been
      added to vnode_setup to tell it to only operate on specific vnode types.
      '-j' for local jail nodes, and '-p' for plab nodes.  If neither are
      specified, the default is to operate on all types.
      3a3c95fb
  11. 07 Sep, 2006 3 commits
    • Leigh B. Stoller's avatar
      Minor bugfix. · befb3434
      Leigh B. Stoller authored
      befb3434
    • Mike Hibler's avatar
      Another instance of the last typo · 6e421b37
      Mike Hibler authored
      6e421b37
    • Leigh B. Stoller's avatar
      Some changes to how log files are handled; this too way too long to · c01f7b3e
      Leigh B. Stoller authored
      do!
      
      The original operation was to save up every log file forever in the
      work directory, and copy that out to both the user directory and the
      info directory (long term archive). When I cleaned /proj on ops
      yesterday of all this old cruft, I recoved 17GB of disk space. Yow!
      
      So, the new operation is:
      
      * Only files that end in .log are copied to the user directory. No
        longer copying out .top, .ptop, and a couple of other logs; 99% of
        users never look at these things. We still have them available to us
        though, on boss.
      
      * At the beginning of each swap operation, clean out the work
        directory of all the old log files. These are named a variety of
        ways, so I use some pattern patches to do this.
      
      * Jigger the names a little so that we do not name things in the form
        "$$.log", to avoid copying out different named files to the user
        directory each time; instead link the .log file to the real output
        file so that it gets overwritten each time, while still getting the
        per-swap files for long term storage.
      c01f7b3e
  12. 06 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      Okay, this is a nasty little hack ... Add support for a global delays · d4881005
      Leigh B. Stoller authored
      reset. I've done this with an event group cause otherwise I was going
      to get sucked into the event system and spit out the other end. You can
      reset the delays in your experiment either from the ns file:
      
      	$ns at 100 "$ns reset-lans"
      
      or from the command line:
      
      	tevc -e foo/bar now all_lans reset
      
      and yes, "all_lans" is a magic token.
      
      It would be nice to support per-link or lan reset, but that is going
      to require reorganizing the delay start up scripts on the delay nodes,
      since right now a single delay agent operates for muliple links and lans.
      d4881005
  13. 05 Sep, 2006 2 commits
    • Leigh B. Stoller's avatar
      A bunch of template changes resulting from meetings last week. · 087dbfff
      Leigh B. Stoller authored
      * Add XMLRPC interface for template swapin,stoprun,startrun,swapout and
        add the appropriate wrappers to the script_wrapper on ops.
      
      * Allow parameter descriptions in NS files. This is probably not in its
        final form since its a bit confusing as to what has priority; something
        in the NS file or a metadata item. Anyway, you can do this in your NS
        file:
      
      	$ns define-template-parameter GUID "0/0" "The GUID to be analyzed"
      
        The rules are currently that the NS file description has priority and
        is copied to child templates, unless the user has modified a description
        via the web interface, in which case the NS file description is ignored.
        I know, sounds awful, but for the most part people are going to use the
        NS file anyway.
      
      * Add "clear" option when starting a new experiment run; the per
        experiment DB at the logholes are cleared. Note that this is *not* the
        default behaviour; you have to either check the checkbox on the web form
        or use the -c option to the script wrapper, or clear=yes if talking
        directly to the XMLRPC server.
      
      * Fix up how email is generated for template_swapin and template_create,
        so that Kevin can debug tblog/tbreport stuff, but also so that we maintain
        mail logs as before. I have made some improvements to libaudit so as to
        centralize the mail goo, and avoid duplicating all that stuff.
      
      * Minor fixes to the program agent so that the new environment strings are
        sent before the program agent exits and reloads them!
      
      * Other minor little things.
      087dbfff
    • Leigh B. Stoller's avatar
      Bug fix for daemon mode, which was preventing the child from really · a4d8a2a5
      Leigh B. Stoller authored
      detaching from the parent.
      
      Also improve the logonly mode by adding a nodelete option, to retain
      the logfile after the email is sent.
      
      Minor improvements to the interface.
      a4d8a2a5
  14. 31 Aug, 2006 3 commits
    • Mike Hibler's avatar
      Fix a typo spotted by Keith S. · c641b15f
      Mike Hibler authored
      c641b15f
    • Kevin Atkinson's avatar
      · 964b8d11
      Kevin Atkinson authored
      Add patch to modify Mysql.pm to allow setting the "InactiveDestroy" in
      the underlying DB handle.  Also avoid disconnecting the file handle
      explistly on DESTROY as that will be taken care of in the DESTROY
      method for the the DB handle.
      
      Override perl version of fork() to set InactiveDestroy in all open
      database handles in the child so that it won't send a disconnect when
      the handle is destroyed as this will also close the database handle
      for the parent.  It will also call tblog_new_child_process in the
      child process to properly inform tblog of the new process. This will
      be a NoOp if the libtblog module is not loaded.
      964b8d11
    • Leigh B. Stoller's avatar
      * Finish up the Commit From Template support. · 3327ba01
      Leigh B. Stoller authored
      * Export the above via the XMLRPC interface and add a wrapper function
        to the script_wrapper. This allows you do to this on ops:
      
      	cd /proj/testbed/templates/10023/1
              Edit some files
              template_commit
      
        Which creates a new template, using the current directory to infer
        the template. Otherwise, provide the template GUID on the command line.
        Hmm, maybe this should be called template_modify? Either way, the
        name does not quite match
      
      * Export template_export via the XMLRPC wrapper. This allows you to
        export a template (instance) record from the command line on ops.
      
      
      	cd /proj/testbed/templates/10023/1
              template_export -i 12
              Exported to /proj/testbed/export/10000/3/12
      
        Which exports the template record for instance number 12. Again, the
        GUID is infered, but you can specify one on the command line. The export
        directory is printed so you know where it went. Note that export does
        *not* populate a DB on ops with the old DB data.
      3327ba01
  15. 30 Aug, 2006 1 commit
    • Kirk Webb's avatar
      · 210d1a85
      Kirk Webb authored
      A node update bugfix and a change to the way nodes with more than two
      changed attributes are handled.  A single message is now sent detailing
      which nodes need to be looked at, and such nodes to not stop the rest from
      updating normally during that run.  Previously the nodes with multiple changes
      had to be handled first, then the update script had to be run after that to
      catch everything.
      210d1a85
  16. 28 Aug, 2006 3 commits
    • Kirk Webb's avatar
      · 3bd367f1
      Kirk Webb authored
      It would help if I actually commit the libplabmon library module...
      3bd367f1
    • Kirk Webb's avatar
      · 37f4392e
      Kirk Webb authored
      Updates to the plab monitor.  Fixed a couple of bugs and created a
      separate libplabmon library module.
      37f4392e
    • Mike Hibler's avatar
      Make ns verify pass work for clouds · 4496419f
      Mike Hibler authored
      4496419f
  17. 25 Aug, 2006 2 commits
    • Leigh B. Stoller's avatar
      Add support for dynamic registration of ports on experimental nodes so · 73102ef8
      Leigh B. Stoller authored
      that clients and servers can avoid using hardwired ports on those
      experimental nodes. I have added the following tmcd operation:
      
      	tmcc portregister <service> [<port>]
      
      where we assume its the control network IP (from the DB), and the pid/eid
      of the node the experiment belongs to. The given port is entered into
      the port_registration table for the experiment, using the service as the
      tag. Supplying port=0 clears the registration from the table.
      
      When called like:
      
      	tmcc portregister <service>
      
      we return the registered port, or nothing.
      
      I hacked up a little C library module in libtb so that there is something
      that looks like a C interface to this:
      
       	int
       	PortRegister(char *service, int port);
      
       	int
       	PortLookup(char *service, char *hostname, int namelen, int *port);
      
      The above routines call out to tmcc of course.
      
      Lastly, I changed the sync server and client to use the new port
      registration, via the library calls above.
      
      There are other emulab services that need to be changed as well, but
      they can be done on an as needed basis.
      73102ef8
    • Kevin Atkinson's avatar
      · 312021d4
      Kevin Atkinson authored
      More tbreport changes from Mike Kasick <mkasick@andrew.cmu.edu>:
      
      - Added tblog support to nscheck.
      
      - Added ns_parse_failed error to nscheck.
      
      - Added invocation column to report_errors to differentiate between assign
        runs in infeasible resource assignments.
      312021d4
  18. 24 Aug, 2006 1 commit
  19. 22 Aug, 2006 1 commit
  20. 21 Aug, 2006 2 commits
    • Kirk Webb's avatar
      · af0d6629
      Kirk Webb authored
      Some bugfixes and updates to the monitor.
      
      * Added load average monitoring and initial test startup randomization
      
      The load the monitor was exerting, especially at startup, was pretty high.
      This change appears to have brought that under control.
      
      * Fixed window size bug(s)
      
      There were a few bugs related to tracking the outstanding child process
      window that are corrected by this checkin.
      af0d6629
    • Kevin Atkinson's avatar
      · 9b718661
      Kevin Atkinson authored
      Avoid counting planetlab vnodes twice.
      9b718661