1. 08 Dec, 2006 1 commit
    • Leigh B. Stoller's avatar
      As discussed in meetings and email ... this commit changes what is · b898a8cc
      Leigh B. Stoller authored
      archived.  Rather then a special "archive" directory in the experiment
      directory, we know archive the entire experiment directory.
      
      This change should be backwards compatable, but let me know if not.
      
      Note that the nsdata directory is gone; the nsfile comes from the
      tbdata, but I know place a copy in nsfile.ns so that the name is well
      known.
      b898a8cc
  2. 14 Sep, 2006 1 commit
  3. 10 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      The bulk of this commit adds the ability to run the program agent on ops · e8bb6bca
      Leigh B. Stoller authored
      so that users can schedule program events to run there. For example:
      
      	set myprog [new Program $ns]
      	$myprog set node "ops"
      	$myprog set command "/usr/bin/env >& /tmp/foo"
      
      	$ns at 10 "$myprog start"
      or
      	tevc -e pid/eid now myprog start
      
      Since the program agent cannot talk to tmcd from ops, there are new
      routines to create the config files that the program agent uses, in
      the expertment tbdata directory.
      
      I also rewrote the eventsys.proxy script that starts the event
      scheduler on ops; I rolled the startup of the program agent into this
      script, via new -a option which is passed over from boss when an ops
      program agent is detected in the virt topology. This keep the number
      of new processes on ops to a small number.
      
      Also part of the above rewrite is that we now catch when event
      scheduler (or the program agent) exits abnormally, sending email to
      tbops and the swapper of the experiment. We have been seeing abnormal
      exits of the scheduler and it would good to detect and see if we can
      figure out what is going wrong.
      
      Other small bug fixes in experiment run.
      e8bb6bca
  4. 08 Jun, 2006 1 commit
  5. 16 Feb, 2006 1 commit
  6. 19 Dec, 2005 1 commit
    • Kevin Atkinson's avatar
      · 45f997fd
      Kevin Atkinson authored
      Updates to to Error Logging API Code.
      
      You should start seeing much better error messages coming from my
      system.  Errors coming from parse.proxy and assign (the two most
      frequent sources of errors) should now be concise and to the point.
      Errors coming from libosload/libreboot (the next most frequent source
      of errors) should now also be much better, but not perfect.  Getting
      perfect errors will likely a rework of how errors are handled in
      libosload/libreboot, just adding tberror/tbwarn/tbnotice calls is not
      enough.  I can do this at a latter date if necessary.
      
      A few minor database changes.
      
      Some changes to the API.  A few bug fixes. Lots of tberror/tbwarn/tbnotice
      added to scripts.
      
      Since assign is a C program, and at this time my API is perl only, I wrote a
      second wrapper around assign, assign_wrapper2.  When assign fails errors are
      now parsed in assign_wrapper2, sent to stderr and logged.  This means that
      RunAssign() just returns when assign fails rather than echoing some of
      assign.log output and then quiting.  The output to the activity log remains
      unchanged.
      
      Since "parse.proxy" is run from ops I couldn't use my API in it, even though
      it is a perl program.  Instead I parse the errors coming form it in
      parse-ns.
      45f997fd
  7. 04 Nov, 2005 1 commit
  8. 09 Feb, 2005 1 commit
    • Timothy Stack's avatar
      · ea206d40
      Timothy Stack authored
      Robot-related improvements based on feedback from Sid:
      
      	* event/sched/event-sched.h, event/sched/event-sched.c,
      	event/sched/rpc.h, event/sched/rpc.cc: Send a notification of the
      	start of event time and timelines.  Add a '-r' option that forces
      	it to use the default rpc path (so dan can use a custom
      	scheduler/rmcd/etc on ops).
      
      	* robots/GNUmakefile.in, robots/emc/GNUmakefile.in,
      	robots/mtp/GNUmakefile.in, robots/primotion/GNUmakefile.in,
      	robots/rmcd/GNUmakefile.in, robots/vmcd/GNUmakefile.in,
      	robots/tbsetdest/tbsetdest.cc: Add a control-install target.
      
      	* robots/primotion/garcia-pilot.cc: Take out some experimental
      	code.
      
      	* robots/primotion/dashboard.hh, robots/primotion/dashboard.cc,
      	robots/primotion/pilotClient.cc, robots/primotion/wheelManager.hh,
      	robots/primotion/wheelManager.cc: When doing a COMMAND_STOP, make
      	sure we only send one update packet, an idle or an abort.  Also
      	send back the actual distance travelled/pivoted.
      
      	* robots/tbsetdest/tbsetdest.cc: Don't output setdest event's with
      	a speed of zero.
      
      	* tbsetup/eventsys_control.in: When replaying/stopping, clear the
      	start of event time for the experiment.
      
      	* tbsetup/power_mail.pm.in: Decrease the threshold used to tell if
      	a node has been power cycled "recently".
      
      	* tbsetup/ns2ir/node.tcl, tbsetup/ns2ir/sim.tcl.in: Bah, backout
      	last change, orientation is supposed to be in degrees at these
      	points.  Also, when automatically picking the sync server, don't
      	use nodes that are subnodes.
      
      	* tmcd/common/bootsubnodes: Send an ISUP for motes.
      
      	* www/moteleds.php3: Show the mote node names instead of the
      	robots.
      
      	* www/powertime.php3: Add option to mark nodes as powered off.
      
      	* www/robotmap.php3: Display the elapsed time for the overall
      	event time and for each active timeline.
      
      	* www/showexp.php3: Only display the blinky lights menu item when
      	there are motes.
      
      	* www/shownode.php3: Change "Update Power Time" -> "Update Power
      	State".
      
      	* www/tutorial/mobilewireless.php3: Fix some nits.
      
      	* xmlrpc/emulabserver.py.in: Add event_time_start method for
      	storing/getting/clearing the start of event time for an
      	experiment/timeline.
      ea206d40
  9. 08 Sep, 2004 1 commit
  10. 30 Aug, 2004 1 commit
    • Leigh B. Stoller's avatar
      The bulk of the event system changes. · 9aa6b5ca
      Leigh B. Stoller authored
      * The per-experiment event scheduler now runs on ops instead of boss.
        Boss still runs elvind and uses events internally, but the user part
        of the event system has moved.
      
      * Part of the guts of eventsys_control moved to new script, eventsys.proxy,
        which runs on ops and fires off the event scheduler. The only tricky part
        of this is that the scheduler runs as the user, but killing it has to be
        done as root since a different person might swap out the experiment. So,
        the proxy is a perl wrapper invoked from a root ssh from boss, which
        forks, writes the pid file into /var/run/emulab/evsched/$pid_$eid.pid,
        then flips to the user and execs the event scheduler (which is careful
        not to fork). Obviously, if the kill is done as root, the pid file has to
        be stored someplace the user is not allowed to write.
      
      * The event scheduler has been rewritten to use Tim's C++ interface to the
        sshxmlrpc server on boss. Actually, I reorg'ed the scheduler so that it
        can be built either as a mysql client, or as RPC client. Note that it can
        also be built to use the SSL version of the XMLRPC server, but that will
        not go live until I finish the server stuff up. Also some goo for dealing
        with building the scheduler with C++.
      
      * Changes to several makefiles to install the ops binaries over NFS to
        /usr/testbed/opsdir. Makes life easier, but only if boss and ops are
        running the same OS. For now, using static linking on the event scheduler
        until ops upgraded to same rev as boss.
      
      * All of the event clients got little tweaks for dealing with the new CNAME
        for the event system server (event-sever). Will need to build new images
        at some point. Old images and clients will continue to work cause of an
        inetd hack on boss that uses netcat to transparently redirect elvind
        connections to ops.
      
      * Note that eventdebug needs some explaining. In order to make the inetd
        redirect work, elvind cannot be listening on the standard port. So, the
        boss event system uses an alternate port since there are just a few
        subsystems on boss that use the server, and its easy to propogate changes
        on boss. Anyway, the default for eventdebug is to connect to the standard
        port on localhost, which means it will work as expected on ops, but will
        require -b argument on boss.
      
      * Linktest changes were slightly more involved. No longer run linktest on
        boss when called from the experiment swapin path, but ssh over to ops to
        fire it off. This is done as the user of course, and there are some
        tricks to make it possible to kill a running linktest and its ssh when
        experiment swapin is canceled (or from the command line) by forcing
        allocation of a tty. I will probably revisit this at some point, but I
        did not want to spend a bunch of time on linktest.
      
      * The upgrade path detailed in doc/UPDATING is necessarily complicated and
        bound to cause consternation at remote sites doing an upgrade.
      9aa6b5ca
  11. 17 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  12. 05 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Frontend and parser portion of two event system changes: · 091a0b62
      Leigh B. Stoller authored
      * Generate a shared secret key for the event system. This key is
        stored into the DB, and passed to the node via tmcd. It is also
        stashed into a file in the experiment directory (can be accessed
        only by the project/group members). The key is used to attach a
        HMAC (hashed message authentication) to each event, which is checked
        by the receivers to ensure that the event is not bogus. More details
        on this later when I commit the event library/client changes.
      
      * Added "virt_programs" table to store info about each program object
        defined by the user. The intent is to no longer send the command
        string in the event, but to fix it in the DB, and transfer it via
        tmcd. This removes our "remote execution facility" which was always
        a bad idea (we have ssh for that, and that is a lot more secure then
        the event system!).
      
        Note that for the time being we need to continue send the command in
        the event because of old images, but the new images will now ignore
        that part of the event.
      091a0b62
  13. 01 May, 2003 1 commit
    • Leigh B. Stoller's avatar
      Add the long desired halt/swap event directives. You can now put this · 5116cd33
      Leigh B. Stoller authored
      in your NS file:
      
      	$ns at 2000.0 "$ns halt"
      or
      	$ns at 2000.0 "$ns swapout"
      
      The first causes the experiment to terminate, the later causes it to
      swap out. I know some wiseass is going to ask for a swapin event!
      You can also send these events from tevc:
      
      	tevc -e testbed/stopme now ns halt
      or
      	tevc -e testbed/stopme now ns swapout
      
      Does it need to be said that this is insecure? That we could get swap
      wars going on as people try to get nodes for their experiments by
      swapping out someone else? Well, if that happens we will apply the big
      hammer and squash their nuts.
      
      Details: I added an SIMULATOR "agent", and HALT/SWAPOUT event types in
      the usual places. In the event scheduler, SIMULATOR events are treated
      specially (not actually sent anywhere), but handled internally. Very
      convenient, cause the scheduler runs as the person who swapped the
      experiment in, and so I just run either swapexp or endexp, right from
      the scheduler. At some point we need to give the permission issue some
      thought.
      5116cd33
  14. 03 Mar, 2003 1 commit
  15. 13 Sep, 2002 1 commit
  16. 07 Jul, 2002 1 commit
  17. 12 May, 2002 1 commit
  18. 09 May, 2002 1 commit
  19. 26 Apr, 2002 1 commit
  20. 22 Mar, 2002 1 commit
  21. 05 Mar, 2002 1 commit
    • Leigh B. Stoller's avatar
      A wide ranging set of event system changes: · 0318cc22
      Leigh B. Stoller authored
      assign_wrapper.in: Hack in a change that ensures a delay node is
      created for any link on which an event is posted (up,down,modify),
      no matter what its initial parameters are. ie: If a link is created
      with no delay, but there is an event that adds a delay later, then we
      must drop in a delay node. Same for up/down on a link. We do this in
      the delay node. I am reasonably confident that this change is fine for
      duplex links, but I am less sure of the effect on lans!
      
      eventsys_control.in: Checkpoint latest changes. Add "replay" option,
      which right now just stops and starts the event scheduler so that it
      reloads the entire event list. Add check for existing experiment, and
      that the experiment is either active or swapping (do not want to start
      a scheduler for a swapped out experiment!). Add check to see if there
      are any events, and skip startup if there are not events in the DB.
      Lastly, get very serious about preventing more than one scheduler from
      being started, either by accident or intentionally. My protocol is to
      lock the table, grab and set the pid to -pid, test the pid for a
      positive value, and if positive, send the scheduler a kill(TERM) so
      that it can cleanup, clear the pid to zero in the DB, and exit. This
      approach ensures that we do not try to send a kill to a pid that is no
      longer active or owned by the user (this last part is not really
      necessary cause of how pids are reused, but it was easy to add so why
      not).
      
      exports_setup.in: Trivial change to make it easier to turn this on
      temporarily in devel trees.
      named_setup.in: Ditto.
      
      node_reboot.in: Add call to TBdbfork() in child cause of apparent DB
      connection problems across forks. In the child, set the eventstatus
      for the node to REBOOT if successful (not this event status stuff is
      temporary, will be recast in next set of revisions).
      
      GNUmakefile:  Add new controlling program, eventsys_control.
      power.in:     Ditto previous comment about REBOOT.
      os_setup.in:  Non event system cleanups.
      tbend.in:     Add DB cleanup of the new virt_trafgens and eventlist tables.
      tbprerun.in:  Ditto.
      tbreport.in:  Print out the event list in a pretty print format.
      tbswapin.in:  Add call to start the event system. Also a big fix; move
                    the named script up above the os_setup so that the named
                    tables have been updated by the time the first node
                    reboots. I noticed that nodes were failing on gethostbyname().
      tbswapout.in: Add call to stop the event system.
      0318cc22
  22. 27 Feb, 2002 1 commit