1. 14 Aug, 2006 4 commits
    • Kevin Atkinson's avatar
      commit.log · 09d78ac6
      Kevin Atkinson authored
    • Leigh B. Stoller's avatar
      Checkpoint my dynamic event stuff, crude as it is. The idea for this first · 9d021a07
      Leigh B. Stoller authored
      draft is that the user will at the end of an experiment run, log into one
      of his nodes and perform some analysis which is intended to be repeated at
      the end of the next run, and in future instantiations of the template.
      A new table called experiment_template_events holds the dynamic events for
      the template. Right now I am supporting just program events, but it will be
      easy to support arbitrary events later. As an absurd example:
      	node6> /usr/local/bin/template_analyze ~/data_analyze arg arg ...
      The user is currently responsible for making sure the output goes into a
      file in the archive. I plan to make the template_analyze wrapper handle
      that automatically later, but for now what you really want is to invoke a
      script that encapsulates that, redirecting output to $ARCHIVE (this
      variable is installed in the environment template_analyze.
      The wrapper script will save the current time, and then run the program.
      If the program terminates with a zero exit status, it will ssh over to ops
      and invoke an xmlrpc routine to tell boss to add a program event to both
      the eventlist for the current instance, and to the template_eventlist for
      future instances. The time of the event is the relative start time that was
      saved above (remember, each experiment run replays the event stream from
      time zero).
      For the future, we want to allow this to be done on ops as well, but
      that will take more infrastructure, to run "program agents" on ops.
      It would be nice to install the ssl xmlrpc client side on our images so
      that we do not have to ssh to ops to invoke the client.
    • Mike Hibler's avatar
    • Leigh B. Stoller's avatar
      Change for templates. A new experiment run will cause the program · 0607b3b4
      Leigh B. Stoller authored
      agent to exit. rc.progagent now loops, restarting the program agent,
      but first getting new copies of the agent list and the environment
      from tmcd.
      Note that this conflicts slightly with the pa-wrapper used on plab
      nodes, which also loops. I think we can just get rid of pa-wrapper
      now, along with a slight change to rc.progagent. I'm gonna let Kirk
      comment on this.
      Need new images ...
  2. 11 Aug, 2006 21 commits
  3. 10 Aug, 2006 13 commits
    • Mike Hibler's avatar
      Add TBDBDisconnect to go with TBDBConnect. · d6dd8938
      Mike Hibler authored
    • Dan Gebhardt's avatar
      Major re-do of the initial condition gathering. · e0420109
      Dan Gebhardt authored
      Available data elements in initial condition structure:
      - Exponential average for bandwidth and latency,
      - Number of samples used
      - Number of error-val samples
      - Number of sequential error-val from newest measurement, backwards
      - timestamp of most recent measurement
      - source node
      - destination node
      Testing needed.
    • Kirk Webb's avatar
      · 7901885c
      Kirk Webb authored
      The other half of the changes that cause the plab event proxy to now try
      to get the routable IP of the node from tmcd rather than relying on
      the success of a hostname lookup.  It will still fall back to trying a
      hostname lookup if it can't get the IP from tmcd.
    • Kirk Webb's avatar
      · e087f217
      Kirk Webb authored
      Send along the IP address of the plab node in the return string from
      the 'plabconfig' command.  We can't trust that the node will have
      a resolvable hostname (or have working DNS even..), so slap down the IP
      we have on record in the DB into a file.  This will be used by the
      event proxy, which needs to know the node's routable IP in order to
      subscripe to elvind on ops properly.
    • Leigh B. Stoller's avatar
      Minor fixes. · 5766e95e
      Leigh B. Stoller authored
    • Robert Ricci's avatar
    • Jonathon Duerig's avatar
    • Robert Ricci's avatar
      Add the timestamp at which the connect() occurs for Jon. Uses a new, · 10fa9140
      Robert Ricci authored
      function, fprintTime() which will be used to standardize the time
      Also added const to some declarations to keep the compiler happy.
    • Robert Ricci's avatar
      Use the C99 standard · c4e21e5a
      Robert Ricci authored
    • Leigh B. Stoller's avatar
      Okay, now we can view graphs from the historical data (template record). · 0c1b1a23
      Leigh B. Stoller authored
      A couple of things to note:
      * When requesting a graph, we have to have a checkout of the archive
        (the DB dump file) so that we can create a temporary DB with the data.
        This is done on demand, and the DB is left in place since its a
        fairly time consuming operation to do the checkout and the dbload.
        I do not delete the DBs though; we will need to age them out as needed.
      * Even so, when returning to a page we end up getting the graphs
        again, and that still takes more time then I like to wait. Perhaps
        add a refresh button so that the user has to force a redraw. Might
        need to add a time/date stamp to the graph.
    • Mike Hibler's avatar
      First crack at surviving down planetlab nodes. If the master barrier sync · 5f413b47
      Mike Hibler authored
      node sits in the stub or monitor barrier sync for more than the SYNCTIMO
      timeout value in common-env.sh, it will send a HUP to syncd which will
      knock all the other nodes out of their barrier sync.  If that happens,
      all nodes will print a warning message and continue.
      All nodes wait for both a stub sync and a monitor sync, so if one plab node
      is down, they will timeout on both barrier syncs.  Race conditions?  Sure.
      If for example everyone times out on the stub barrier due to a slow node,
      and then that node reaches the barrier, it will hang there while everyone
      else waits on the monitor barrier.  When the latter times out, it will
      kick the slow node out of the stub sync and it will then proceed to hang
      in the monitor sync until the experiment is stopped.  Got that?
      As an aside, it would be nice if the initializer of a barrier could specify
      a timeout value, and return a special error code to everyone if it timed out,
      but that would require an incompatible change to the sync protocol.
    • Mike Hibler's avatar
      Minor tweaks: · 4c5005da
      Mike Hibler authored
      * add getopt processing
      * adjust delay to be one way before calling tevc
    • Leigh B. Stoller's avatar
      Next checkpoint of graphing code. On a currently active template · 79ae0bfe
      Leigh B. Stoller authored
      instance there are graphs on the instance show page and on the
      individual run show pages. On the run pages, the graphs select just
      the packets between start and stop of the run. I also added drop down
      menus to select particular source and destination vnodes.
  4. 09 Aug, 2006 2 commits