Commits · 779944838ec89c431d9dd45eaa447245579a8dbe · Emmanuel Cecchet / emulab-devel

Sep 13, 2006

· 77994483
Kirk Webb authored 18 years ago
```
Minor bug fix.
```
77994483
Some cleanup to my silly little event viewer widget, and stick it on · 52f174fa
Leigh B. Stoller authored 18 years ago
```
the Show Experiment menu to see if anyone uses it.
```
52f174fa
Minor bugfix to previous revision. · bfae3cdb
Leigh B. Stoller authored 18 years ago

bfae3cdb
Bugfix - I misunderstood how TCP options were parsed into the Options · f38a9a88
Robert Ricci authored 18 years ago
```
structure. SACK handling should now be fixed.
```
f38a9a88

Changes to allow waiting for a specific completion event, which is needed · 5c564bc8

Leigh B. Stoller authored 18 years ago

to make stoprun waiting work correctly.

When tevc is invoked with the -w (wait for completion) option, tevc
generates a token to put into the notification. The event scheduler will
not generate a new token if there is already on in the notification, but
instead pass it on.

For the specific case of stoprun, the simulator agent has to pass that
token along to boss and template_exprun, which generates the completion
event (for reasons discussed in prior commit message).

5c564bc8

Sep 12, 2006

Fix a bug I introduced with a careless copy and paste - I was copying · a9e89bd0
Robert Ricci authored 18 years ago
```
an int in twice.

Also fix another bug (masked by the previous) I introduced into
census()
```
a9e89bd0

· 52dcfd48

Kirk Webb authored 18 years ago

Added secondary logging for node setup/teardown success/failure.  Also log
node pool membership changes in this log.

52dcfd48

This started out as a simple little hack to add a StopRun "ns" event, but · cbdc4178

Leigh B. Stoller authored 18 years ago

it got more complicated as it progressed.

The bulk of the change was changing template_exprun so that it can take a
pid/eid as an alternative to eid/guid. This is a big convenience since its
easy to find the template from a running experiment, and it makes it
possible to invoke from the event scheduler, which has never heard of a
template before (and its not something I wanted to teach it about).  Its
also easier on users.

Anyway, back to the stoprun event. You can now do this:

	$ns at 100 "$ns stoprun"
or
	tevc -e pid/eid now ns stoprun

You can add the -w option to wait for the completion event that is sent,
but this brings me to the glaring problems with this whole thing.

* First, the scheduler has to fire off the stoprun in the background,
  since if it waits, we get deadlock. Why? Cause the implementation of
  stoprun uses the event system (SNAPSHOT event, other things), and if
  the scheduler is sitting and waiting, nothing happens.

  Okay, the solution to this was to generate a COMPLETION event from
  template_exprun once the stop operation is complete. This brings me
  to the second problem ...

* Worse, is that the "ns" events that are sent to implement stoprun (like
  snapshot) send their own completion events, and that confuses anyone
  waiting on the original stoprun event (it returns early).

  So what to do about this? There is a "token" field in the completion
  event structure, which I presume is to allow you to match things up.  But
  there is no way to set this token using tevc (and then wait for it), and
  besides, the event scheduler makes them up anyway and sticks them into
  the event. So, the seed of a fix are already germinating in my mind, but
  I wanted to get this commit in so that Mike would have fun reading this
  commit log.

cbdc4178

Add a ton of debugging output, showing the byte locations that each · 84cf1d12

Robert Ricci authored 18 years ago

'field' is written to and read from. This was done to aid the
debugging of reading and writing replay files.

However, this output is ridiculously verbose, so it's commented out.

84cf1d12

Try to make sure we get core dumps, by upping the coredump size · 4d11d3ec
Robert Ricci authored 18 years ago
```
rlimit.

Also, check for error in packet size calculation vs. how much data
is actually saved.
```
4d11d3ec

Serious bugfix - PakcetInfo::census() was undercounting the number · 5b3b2838

Robert Ricci authored 18 years ago

of bytes required to save the packet. This was causing us to create
a buffer too small to hold the packet, causing memory corruption bugs
and causing us to write invalid replay files.

The way that the packet size claculation is separated from the saving
of the packet is a serious problem, and needs to be re-designed!

5b3b2838

Checkpoint little web page to spew the event stream out. The bulk of · 4820df1b

Leigh B. Stoller authored 18 years ago

this change was actually refactoring Tim's spewlog code to be more
general so that it can be used elsewhere. I still need to go back and
change Tim's oroginal code to use the stuff.

4820df1b

Quick fix to LOG_EVERYTHING. · 9c9b43b4
Jonathon Duerig authored 18 years ago

9c9b43b4
Finished adding the REPLAY option for logging. Added an explanation of how to... · c82c98d8
Jonathon Duerig authored 18 years ago
```
Finished adding the REPLAY option for logging. Added an explanation of how to add new logging options to the comments at the top.
```
c82c98d8

Sep 11, 2006

Add a REPLAY log option for some debugging I'm doing. · b811b788
Robert Ricci authored 18 years ago

b811b788

Improve type handling in capturePacket - before, when it got an · fd955b65

Robert Ricci authored 18 years ago

invalid type, it was it was assuming it was an ack. Now, it will
error out.

This was masking errors in replay, which I am stil trying to track
down.

fd955b65

· aa446875

Kirk Webb authored 18 years ago

plab logging enhancements.

timing information for various RPCs is now logged to
/usr/testbed/log/plabtiming.log.  This info will be useful for extracting
trends for the various plab nodes, and in calculating reliability and
timing metrics.  These could be used, for e.g., to pick nodes that tend to
come up more quickly.

This update also squelches much of the python backtrace noise when plab nodes
fail to setup correctly (can be turned on with debug flag).  Instead, failures
are summarized on a single line.

Oh, and pay no attention to the aspect behind the curtain!  Yes, you may
groan and moan if you wish - I'm using aspects to help do the logging.  I
find this to be a really slick way of wrapping several functions!

aa446875

Fix a bug in replayWrite() - for some reason, it is occasionally · 09e2549f
Robert Ricci authored 18 years ago
```
getting called with a length of 0. In this case, write() returning a
0 does not indicate an error.
```
09e2549f

Preliminary SACK support. Still does not handle several cases, all · be0a70cf

Robert Ricci authored 18 years ago

documented by XXXes in the code. The most important one is that it
will probably fail when wraparound occurs. It also still makes the
assumption that the reciever will only ACK whole packets, not partial
packets, but this seems to work in practice.

Note: I have been able to test it in the presence of a SACK due to
problems with replay.

be0a70cf

If we seem to have gotten a SYNACK out of order, treat it as okay and · ca1f006b
Robert Ricci authored 18 years ago
```
just set the state to ESTABLISHED.
```
ca1f006b
Add an > operator · b61bcf63
Robert Ricci authored 18 years ago

b61bcf63
fix: stop any tests on a node when removing it as a representative node · baa96f86
Dan Gebhardt authored 18 years ago
```
for a site.
```
baa96f86
Increased the "kill" threshold when detecting if a node cannot receive data · 58bf31f8
Dan Gebhardt authored 18 years ago
```
ACK packets from ops.
```
58bf31f8

Sep 10, 2006

The bulk of this commit adds the ability to run the program agent on ops · e8bb6bca

Leigh B. Stoller authored 18 years ago

so that users can schedule program events to run there. For example:

	set myprog [new Program $ns]
	$myprog set node "ops"
	$myprog set command "/usr/bin/env >& /tmp/foo"

	$ns at 10 "$myprog start"
or
	tevc -e pid/eid now myprog start

Since the program agent cannot talk to tmcd from ops, there are new
routines to create the config files that the program agent uses, in
the expertment tbdata directory.

I also rewrote the eventsys.proxy script that starts the event
scheduler on ops; I rolled the startup of the program agent into this
script, via new -a option which is passed over from boss when an ops
program agent is detected in the virt topology. This keep the number
of new processes on ops to a small number.

Also part of the above rewrite is that we now catch when event
scheduler (or the program agent) exits abnormally, sending email to
tbops and the swapper of the experiment. We have been seeing abnormal
exits of the scheduler and it would good to detect and see if we can
figure out what is going wrong.

Other small bug fixes in experiment run.

e8bb6bca

Added a first rough draft of the least squares path saturation sensor. There... · 9c6f20f0

Jonathon Duerig authored 18 years ago

Added a first rough draft of the least squares path saturation sensor. There are a lot of rough edges detailed earlier in a message to Rob. This is totally untested code.

9c6f20f0

Sep 08, 2006

Added rudimentary error checking for sensors. Each sensor has an ackValid and... · a2e29d0a

Jonathon Duerig authored 18 years ago

Added rudimentary error checking for sensors. Each sensor has an ackValid and a sendValid boolean value which says whether the data from a recent ack or send is valid. These should be checked before any access to data in a sensor.

a2e29d0a

Two small changes: · 77d2e17c

Leigh B. Stoller authored 18 years ago

* Handle cancelation of instantiation.

* Call out to template_exprun instead of inlining most of what it does.

77d2e17c

· 3a3c95fb

Kirk Webb authored 18 years ago

Parallelize the setup of plab vnodes alongside the loading of local
physical nodes.  We fork vnode_setup to operate on the plab vnodes just
before firing off local reload/reboot/reconfig operations.  The status
of the plab vnode setup setup is checked just before firing off vnode_setup
for any local vnodes.  The ISUP wait for plab vnodes continues to fall
within the same stage as wating for local vnodes.  New arguments have been
added to vnode_setup to tell it to only operate on specific vnode types.
'-j' for local jail nodes, and '-p' for plab nodes.  If neither are
specified, the default is to operate on all types.

3a3c95fb

Sep 07, 2006

Minor bugfix. · befb3434
Leigh B. Stoller authored 18 years ago

befb3434
minor changes to fix bug with the managerID · 88149f3f
Dan Gebhardt authored 18 years ago

88149f3f
Started out trying to make latency-due-to-low-bandwidth calculation more · 548c15bb
Mike Hibler authored 18 years ago
```
accurate.  Not sure I improved it dramatically, but I sure did move the
code around a lot!
```
548c15bb
some minor changes · e194c3fa
Dan Gebhardt authored 18 years ago

e194c3fa
lint · 2c5d32bd
Mike Hibler authored 18 years ago

2c5d32bd
Another instance of the last typo · 6e421b37
Mike Hibler authored 18 years ago

6e421b37

Some changes to how log files are handled; this too way too long to · c01f7b3e

Leigh B. Stoller authored 18 years ago

do!

The original operation was to save up every log file forever in the
work directory, and copy that out to both the user directory and the
info directory (long term archive). When I cleaned /proj on ops
yesterday of all this old cruft, I recoved 17GB of disk space. Yow!

So, the new operation is:

* Only files that end in .log are copied to the user directory. No
  longer copying out .top, .ptop, and a couple of other logs; 99% of
  users never look at these things. We still have them available to us
  though, on boss.

* At the beginning of each swap operation, clean out the work
  directory of all the old log files. These are named a variety of
  ways, so I use some pattern patches to do this.

* Jigger the names a little so that we do not name things in the form
  "$$.log", to avoid copying out different named files to the user
  directory each time; instead link the .log file to the real output
  file so that it gets overwritten each time, while still getting the
  per-swap files for long term storage.

c01f7b3e

Sep 06, 2006

Okay, this is a nasty little hack ... Add support for a global delays · d4881005

Leigh B. Stoller authored 18 years ago

reset. I've done this with an event group cause otherwise I was going
to get sucked into the event system and spit out the other end. You can
reset the delays in your experiment either from the ns file:

	$ns at 100 "$ns reset-lans"

or from the command line:

	tevc -e foo/bar now all_lans reset

and yes, "all_lans" is a magic token.

It would be nice to support per-link or lan reset, but that is going
to require reorganizing the delay start up scripts on the delay nodes,
since right now a single delay agent operates for muliple links and lans.

d4881005

Add a check for TCP window scaling, which is not yet supported · 59ba782a
Robert Ricci authored 18 years ago

59ba782a

Tweak to example. · 69fcce51

Mike Hibler authored 18 years ago

Make a version of the example which shows an unroutable control network.

69fcce51

Mark 'other emulabs' as new · 5e32ec28
Robert Ricci authored 18 years ago

5e32ec28

Added and updated several other emulabs. Marked them as 'new' · b6f902a5

Robert Ricci authored 18 years ago

Standardized way in which domain names for other emulab are given

Re-formatted poorly formatted entries

Re-order to put higher-impact emulabs first

b6f902a5