Skip to content
Snippets Groups Projects
  1. Sep 13, 2006
  2. Sep 12, 2006
    • Robert Ricci's avatar
      Fix a bug I introduced with a careless copy and paste - I was copying · a9e89bd0
      Robert Ricci authored
      an int in twice.
      
      Also fix another bug (masked by the previous) I introduced into
      census()
      a9e89bd0
    • Kirk Webb's avatar
      · 52dcfd48
      Kirk Webb authored
      Added secondary logging for node setup/teardown success/failure.  Also log
      node pool membership changes in this log.
      52dcfd48
    • Leigh B. Stoller's avatar
      This started out as a simple little hack to add a StopRun "ns" event, but · cbdc4178
      Leigh B. Stoller authored
      it got more complicated as it progressed.
      
      The bulk of the change was changing template_exprun so that it can take a
      pid/eid as an alternative to eid/guid. This is a big convenience since its
      easy to find the template from a running experiment, and it makes it
      possible to invoke from the event scheduler, which has never heard of a
      template before (and its not something I wanted to teach it about).  Its
      also easier on users.
      
      Anyway, back to the stoprun event. You can now do this:
      
      	$ns at 100 "$ns stoprun"
      or
      	tevc -e pid/eid now ns stoprun
      
      You can add the -w option to wait for the completion event that is sent,
      but this brings me to the glaring problems with this whole thing.
      
      * First, the scheduler has to fire off the stoprun in the background,
        since if it waits, we get deadlock. Why? Cause the implementation of
        stoprun uses the event system (SNAPSHOT event, other things), and if
        the scheduler is sitting and waiting, nothing happens.
      
        Okay, the solution to this was to generate a COMPLETION event from
        template_exprun once the stop operation is complete. This brings me
        to the second problem ...
      
      * Worse, is that the "ns" events that are sent to implement stoprun (like
        snapshot) send their own completion events, and that confuses anyone
        waiting on the original stoprun event (it returns early).
      
        So what to do about this? There is a "token" field in the completion
        event structure, which I presume is to allow you to match things up.  But
        there is no way to set this token using tevc (and then wait for it), and
        besides, the event scheduler makes them up anyway and sticks them into
        the event. So, the seed of a fix are already germinating in my mind, but
        I wanted to get this commit in so that Mike would have fun reading this
        commit log.
      cbdc4178
    • Robert Ricci's avatar
      Add a ton of debugging output, showing the byte locations that each · 84cf1d12
      Robert Ricci authored
      'field' is written to and read from. This was done to aid the
      debugging of reading and writing replay files.
      
      However, this output is ridiculously verbose, so it's commented out.
      84cf1d12
    • Robert Ricci's avatar
      Try to make sure we get core dumps, by upping the coredump size · 4d11d3ec
      Robert Ricci authored
      rlimit.
      
      Also, check for error in packet size calculation vs. how much data
      is actually saved.
      4d11d3ec
    • Robert Ricci's avatar
      Serious bugfix - PakcetInfo::census() was undercounting the number · 5b3b2838
      Robert Ricci authored
      of bytes required to save the packet. This was causing us to create
      a buffer too small to hold the packet, causing memory corruption bugs
      and causing us to write invalid replay files.
      
      The way that the packet size claculation is separated from the saving
      of the packet is a serious problem, and needs to be re-designed!
      5b3b2838
    • Leigh B. Stoller's avatar
      Checkpoint little web page to spew the event stream out. The bulk of · 4820df1b
      Leigh B. Stoller authored
      this change was actually refactoring Tim's spewlog code to be more
      general so that it can be used elsewhere. I still need to go back and
      change Tim's oroginal code to use the stuff.
      4820df1b
    • Jonathon Duerig's avatar
      Quick fix to LOG_EVERYTHING. · 9c9b43b4
      Jonathon Duerig authored
      9c9b43b4
    • Jonathon Duerig's avatar
      Finished adding the REPLAY option for logging. Added an explanation of how to... · c82c98d8
      Jonathon Duerig authored
      Finished adding the REPLAY option for logging. Added an explanation of how to add new logging options to the comments at the top.
      c82c98d8
  3. Sep 11, 2006
  4. Sep 10, 2006
    • Leigh B. Stoller's avatar
      The bulk of this commit adds the ability to run the program agent on ops · e8bb6bca
      Leigh B. Stoller authored
      so that users can schedule program events to run there. For example:
      
      	set myprog [new Program $ns]
      	$myprog set node "ops"
      	$myprog set command "/usr/bin/env >& /tmp/foo"
      
      	$ns at 10 "$myprog start"
      or
      	tevc -e pid/eid now myprog start
      
      Since the program agent cannot talk to tmcd from ops, there are new
      routines to create the config files that the program agent uses, in
      the expertment tbdata directory.
      
      I also rewrote the eventsys.proxy script that starts the event
      scheduler on ops; I rolled the startup of the program agent into this
      script, via new -a option which is passed over from boss when an ops
      program agent is detected in the virt topology. This keep the number
      of new processes on ops to a small number.
      
      Also part of the above rewrite is that we now catch when event
      scheduler (or the program agent) exits abnormally, sending email to
      tbops and the swapper of the experiment. We have been seeing abnormal
      exits of the scheduler and it would good to detect and see if we can
      figure out what is going wrong.
      
      Other small bug fixes in experiment run.
      e8bb6bca
    • Jonathon Duerig's avatar
      Added a first rough draft of the least squares path saturation sensor. There... · 9c6f20f0
      Jonathon Duerig authored
      Added a first rough draft of the least squares path saturation sensor. There are a lot of rough edges detailed earlier in a message to Rob. This is totally untested code.
      9c6f20f0
  5. Sep 08, 2006
    • Jonathon Duerig's avatar
      Added rudimentary error checking for sensors. Each sensor has an ackValid and... · a2e29d0a
      Jonathon Duerig authored
      Added rudimentary error checking for sensors. Each sensor has an ackValid and a sendValid boolean value which says whether the data from a recent ack or send is valid. These should be checked before any access to data in a sensor.
      a2e29d0a
    • Leigh B. Stoller's avatar
      Two small changes: · 77d2e17c
      Leigh B. Stoller authored
      * Handle cancelation of instantiation.
      
      * Call out to template_exprun instead of inlining most of what it does.
      77d2e17c
    • Kirk Webb's avatar
      · 3a3c95fb
      Kirk Webb authored
      Parallelize the setup of plab vnodes alongside the loading of local
      physical nodes.  We fork vnode_setup to operate on the plab vnodes just
      before firing off local reload/reboot/reconfig operations.  The status
      of the plab vnode setup setup is checked just before firing off vnode_setup
      for any local vnodes.  The ISUP wait for plab vnodes continues to fall
      within the same stage as wating for local vnodes.  New arguments have been
      added to vnode_setup to tell it to only operate on specific vnode types.
      '-j' for local jail nodes, and '-p' for plab nodes.  If neither are
      specified, the default is to operate on all types.
      3a3c95fb
  6. Sep 07, 2006
    • Leigh B. Stoller's avatar
      Minor bugfix. · befb3434
      Leigh B. Stoller authored
      befb3434
    • Dan Gebhardt's avatar
      minor changes to fix bug with the managerID · 88149f3f
      Dan Gebhardt authored
      88149f3f
    • Mike Hibler's avatar
      Started out trying to make latency-due-to-low-bandwidth calculation more · 548c15bb
      Mike Hibler authored
      accurate.  Not sure I improved it dramatically, but I sure did move the
      code around a lot!
      548c15bb
    • Dan Gebhardt's avatar
      some minor changes · e194c3fa
      Dan Gebhardt authored
      e194c3fa
    • Mike Hibler's avatar
      lint · 2c5d32bd
      Mike Hibler authored
      2c5d32bd
    • Mike Hibler's avatar
      Another instance of the last typo · 6e421b37
      Mike Hibler authored
      6e421b37
    • Leigh B. Stoller's avatar
      Some changes to how log files are handled; this too way too long to · c01f7b3e
      Leigh B. Stoller authored
      do!
      
      The original operation was to save up every log file forever in the
      work directory, and copy that out to both the user directory and the
      info directory (long term archive). When I cleaned /proj on ops
      yesterday of all this old cruft, I recoved 17GB of disk space. Yow!
      
      So, the new operation is:
      
      * Only files that end in .log are copied to the user directory. No
        longer copying out .top, .ptop, and a couple of other logs; 99% of
        users never look at these things. We still have them available to us
        though, on boss.
      
      * At the beginning of each swap operation, clean out the work
        directory of all the old log files. These are named a variety of
        ways, so I use some pattern patches to do this.
      
      * Jigger the names a little so that we do not name things in the form
        "$$.log", to avoid copying out different named files to the user
        directory each time; instead link the .log file to the real output
        file so that it gets overwritten each time, while still getting the
        per-swap files for long term storage.
      c01f7b3e
  7. Sep 06, 2006
Loading