- Sep 13, 2006
-
-
Kirk Webb authored
Minor bug fix.
-
Leigh B. Stoller authored
the Show Experiment menu to see if anyone uses it.
-
Leigh B. Stoller authored
-
Robert Ricci authored
structure. SACK handling should now be fixed.
-
Leigh B. Stoller authored
to make stoprun waiting work correctly. When tevc is invoked with the -w (wait for completion) option, tevc generates a token to put into the notification. The event scheduler will not generate a new token if there is already on in the notification, but instead pass it on. For the specific case of stoprun, the simulator agent has to pass that token along to boss and template_exprun, which generates the completion event (for reasons discussed in prior commit message).
-
- Sep 12, 2006
-
-
Robert Ricci authored
an int in twice. Also fix another bug (masked by the previous) I introduced into census()
-
Kirk Webb authored
Added secondary logging for node setup/teardown success/failure. Also log node pool membership changes in this log.
-
Leigh B. Stoller authored
it got more complicated as it progressed. The bulk of the change was changing template_exprun so that it can take a pid/eid as an alternative to eid/guid. This is a big convenience since its easy to find the template from a running experiment, and it makes it possible to invoke from the event scheduler, which has never heard of a template before (and its not something I wanted to teach it about). Its also easier on users. Anyway, back to the stoprun event. You can now do this: $ns at 100 "$ns stoprun" or tevc -e pid/eid now ns stoprun You can add the -w option to wait for the completion event that is sent, but this brings me to the glaring problems with this whole thing. * First, the scheduler has to fire off the stoprun in the background, since if it waits, we get deadlock. Why? Cause the implementation of stoprun uses the event system (SNAPSHOT event, other things), and if the scheduler is sitting and waiting, nothing happens. Okay, the solution to this was to generate a COMPLETION event from template_exprun once the stop operation is complete. This brings me to the second problem ... * Worse, is that the "ns" events that are sent to implement stoprun (like snapshot) send their own completion events, and that confuses anyone waiting on the original stoprun event (it returns early). So what to do about this? There is a "token" field in the completion event structure, which I presume is to allow you to match things up. But there is no way to set this token using tevc (and then wait for it), and besides, the event scheduler makes them up anyway and sticks them into the event. So, the seed of a fix are already germinating in my mind, but I wanted to get this commit in so that Mike would have fun reading this commit log.
-
Robert Ricci authored
'field' is written to and read from. This was done to aid the debugging of reading and writing replay files. However, this output is ridiculously verbose, so it's commented out.
-
Robert Ricci authored
rlimit. Also, check for error in packet size calculation vs. how much data is actually saved.
-
Robert Ricci authored
of bytes required to save the packet. This was causing us to create a buffer too small to hold the packet, causing memory corruption bugs and causing us to write invalid replay files. The way that the packet size claculation is separated from the saving of the packet is a serious problem, and needs to be re-designed!
-
Leigh B. Stoller authored
this change was actually refactoring Tim's spewlog code to be more general so that it can be used elsewhere. I still need to go back and change Tim's oroginal code to use the stuff.
-
Jonathon Duerig authored
-
Jonathon Duerig authored
Finished adding the REPLAY option for logging. Added an explanation of how to add new logging options to the comments at the top.
-
- Sep 11, 2006
-
-
Robert Ricci authored
-
Robert Ricci authored
invalid type, it was it was assuming it was an ack. Now, it will error out. This was masking errors in replay, which I am stil trying to track down.
-
Kirk Webb authored
plab logging enhancements. timing information for various RPCs is now logged to /usr/testbed/log/plabtiming.log. This info will be useful for extracting trends for the various plab nodes, and in calculating reliability and timing metrics. These could be used, for e.g., to pick nodes that tend to come up more quickly. This update also squelches much of the python backtrace noise when plab nodes fail to setup correctly (can be turned on with debug flag). Instead, failures are summarized on a single line. Oh, and pay no attention to the aspect behind the curtain! Yes, you may groan and moan if you wish - I'm using aspects to help do the logging. I find this to be a really slick way of wrapping several functions!
-
Robert Ricci authored
getting called with a length of 0. In this case, write() returning a 0 does not indicate an error.
-
Robert Ricci authored
documented by XXXes in the code. The most important one is that it will probably fail when wraparound occurs. It also still makes the assumption that the reciever will only ACK whole packets, not partial packets, but this seems to work in practice. Note: I have been able to test it in the presence of a SACK due to problems with replay.
-
Robert Ricci authored
just set the state to ESTABLISHED.
-
Robert Ricci authored
-
Dan Gebhardt authored
for a site.
-
Dan Gebhardt authored
ACK packets from ops.
-
- Sep 10, 2006
-
-
Leigh B. Stoller authored
so that users can schedule program events to run there. For example: set myprog [new Program $ns] $myprog set node "ops" $myprog set command "/usr/bin/env >& /tmp/foo" $ns at 10 "$myprog start" or tevc -e pid/eid now myprog start Since the program agent cannot talk to tmcd from ops, there are new routines to create the config files that the program agent uses, in the expertment tbdata directory. I also rewrote the eventsys.proxy script that starts the event scheduler on ops; I rolled the startup of the program agent into this script, via new -a option which is passed over from boss when an ops program agent is detected in the virt topology. This keep the number of new processes on ops to a small number. Also part of the above rewrite is that we now catch when event scheduler (or the program agent) exits abnormally, sending email to tbops and the swapper of the experiment. We have been seeing abnormal exits of the scheduler and it would good to detect and see if we can figure out what is going wrong. Other small bug fixes in experiment run.
-
Jonathon Duerig authored
Added a first rough draft of the least squares path saturation sensor. There are a lot of rough edges detailed earlier in a message to Rob. This is totally untested code.
-
- Sep 08, 2006
-
-
Jonathon Duerig authored
Added rudimentary error checking for sensors. Each sensor has an ackValid and a sendValid boolean value which says whether the data from a recent ack or send is valid. These should be checked before any access to data in a sensor.
-
Leigh B. Stoller authored
* Handle cancelation of instantiation. * Call out to template_exprun instead of inlining most of what it does.
-
Kirk Webb authored
Parallelize the setup of plab vnodes alongside the loading of local physical nodes. We fork vnode_setup to operate on the plab vnodes just before firing off local reload/reboot/reconfig operations. The status of the plab vnode setup setup is checked just before firing off vnode_setup for any local vnodes. The ISUP wait for plab vnodes continues to fall within the same stage as wating for local vnodes. New arguments have been added to vnode_setup to tell it to only operate on specific vnode types. '-j' for local jail nodes, and '-p' for plab nodes. If neither are specified, the default is to operate on all types.
-
- Sep 07, 2006
-
-
Leigh B. Stoller authored
-
Dan Gebhardt authored
-
Mike Hibler authored
accurate. Not sure I improved it dramatically, but I sure did move the code around a lot!
-
Dan Gebhardt authored
-
Mike Hibler authored
-
Mike Hibler authored
-
Leigh B. Stoller authored
do! The original operation was to save up every log file forever in the work directory, and copy that out to both the user directory and the info directory (long term archive). When I cleaned /proj on ops yesterday of all this old cruft, I recoved 17GB of disk space. Yow! So, the new operation is: * Only files that end in .log are copied to the user directory. No longer copying out .top, .ptop, and a couple of other logs; 99% of users never look at these things. We still have them available to us though, on boss. * At the beginning of each swap operation, clean out the work directory of all the old log files. These are named a variety of ways, so I use some pattern patches to do this. * Jigger the names a little so that we do not name things in the form "$$.log", to avoid copying out different named files to the user directory each time; instead link the .log file to the real output file so that it gets overwritten each time, while still getting the per-swap files for long term storage.
-
- Sep 06, 2006
-
-
Leigh B. Stoller authored
reset. I've done this with an event group cause otherwise I was going to get sucked into the event system and spit out the other end. You can reset the delays in your experiment either from the ns file: $ns at 100 "$ns reset-lans" or from the command line: tevc -e foo/bar now all_lans reset and yes, "all_lans" is a magic token. It would be nice to support per-link or lan reset, but that is going to require reorganizing the delay start up scripts on the delay nodes, since right now a single delay agent operates for muliple links and lans.
-
Robert Ricci authored
-
Mike Hibler authored
Make a version of the example which shows an unroutable control network.
-
Robert Ricci authored
-
Robert Ricci authored
Standardized way in which domain names for other emulab are given Re-formatted poorly formatted entries Re-order to put higher-impact emulabs first
-