- 18 Aug, 2006 7 commits
-
-
Jonathon Duerig authored
-
Jonathon Duerig authored
Fixed debugging statement. It needs to be unsigned rather than signed because these are sequence numbers.
-
Robert Ricci authored
-
Kirk Webb authored
Duh - balance pool priorities so that one doesn't starve the rest.
-
Mike Hibler authored
It shows you how the delay pipes are configured for an experiment. Only tested on pelab clouds however...
-
Mike Hibler authored
-
Kirk Webb authored
Left in debugging values on accident...
-
- 17 Aug, 2006 5 commits
-
-
Kirk Webb authored
New plab vnode monitor framework, now with proactive node checking action! The old monitor has been completely replaced. The new one uses modular pools to test and track plab nodes. There are currently two pool modules: good and bad. THe good pool tests nodes that have are not known to have issues to proactively find problems and push nodes into the "bad" pool when necessary. The bad pool acts similarly to the old plabmonitor; it does and end to end test on nodes, and if and when they finally come up, moves them to the good pool. Both pools have a testing backoff mechanism that works as follows: * The node is tested right away upon entering either pool * Node fails to setup: * goodpool: node is sent to bad pool (hwdown) * badpool: node is scheduled to be retested according to an additive backoff function, maxing out at 1 hour. * Node setup succeeds: * goodpool: node is scheduled to be retested according to an additive backoff function, maxing out at 1 hour. * badpool: node is moved to good pool. The backoff thing may be bogus, we'll see. It seems like a reasonable thing to do though - no need to hammer a node with tests if it consistently succeeds or fails. Nodes that flop back and forth will get the most testing punishment. A future enhancement will be to watch for flopping and force nodes that exhibit this behavior to pass several consecutive tests before being eligible for return back into the good pool. The monitor only allows a configurable window's worth of outstanding tests to go on at once. When tests finish, more nodes tests are allowed to start up right away. Some refactoring needs to be done. Currently the good and bad pools share quite a bit of duplicated code. I don't know if I dare venture into inheritance with perl, but that would be a good way to approach this. Some other pool module ideas: * dynamic setup pools When experiments w/ plab vnodes are swapped in, use the plab monitor to manage setting up the vnodes by dynamically creating pools on a per-experiment basis. This has the advantage that the monitor can keep a global cap on the number of outstanding setup operations. These pools might also try to bring up vnodes that failed to setup during swapin later on, along with other vnode monitoring tasks. * "all nodes" pools Similar to the dynamic pools just mentioned, but with the mission to extend experiments to all plab nodes possible (as nodes come and go). Useful for services.
-
Jonathon Duerig authored
Rationalized Rob's previous checkin with mine to remove the additional dependencies that I had made to the now defunct IpHeader
-
Robert Ricci authored
Add a lot of additional debugging output. Fix incorrect TCP payload size calculation - assumed Ethernet, ignored IP and TCP option headers. Replace weird nonstandart IpHeader structure with 'struct ip' from netinet/ip.h .
-
Jonathon Duerig authored
Added sensor replay. It seems to be working perfectly. A replay is automatically saved after every run in plab-n/local/logs/stub.replay. You can get a replay by running a command similar to: sudo ./magent --replay-load=/proj/tbres/exp/pelab-generated/logs/plab-1/local/logs/stub.replay
-
Mike Hibler authored
to get a UID to use.
-
- 16 Aug, 2006 2 commits
-
-
Mike Hibler authored
-
Kevin Atkinson authored
tbreport errors & context. - Modified fatal() in swapexp, batchexp, and tbprerun, and die_noretry() in os_setup to pass hash parameter to tblog functions. - Added tbreport errror & context information for select errors in swapexp, tbswap, assign_wrapper2, snmpit_lib, snmpit, batchexp, assign_wrapper, os_setup, parse-ns, & tbprerun. - Added assign error parser in assign_wrapper2. - Added parse.tcl error parser in parse-ns. - Added severity constants for tbreport in libtblog_simple. - Added tbreport() function & context table mappging for reporting discrete error types to libtblog.
-
- 15 Aug, 2006 7 commits
-
-
Robert Ricci authored
is so that when we use a monitor that doesn't care about, say, writes, we can avoid waisting the CPU required to parse them. The four supported reports are: connect - connect() and close() notifications sockopt - Information about socket options io - read()s and write()s all - Duh. The default if none are given You can specify reports by setting LIBNETMON_REPORTS or using the new -r option to netmond. If you want more than one report type, seperate them with commas. Added a new control message, CM_REPORTS, to pass these back and forth from netmond to libnetmon. Added a long overdue LIBNETMON_NETMOND option which just uses the standard socket paths to talk to netmond - this way, you don't have to set them by hand when debugging, which is a huge PITA.
-
Mike Hibler authored
-
Robert Ricci authored
-
Robert Ricci authored
get put into the $Ports{} hash by snmpit_lib This is just a placeholder: I'll have to put code in later to print out a more interesting representation for trunks.
-
Robert Ricci authored
newer perl versions.
-
Robert Ricci authored
-
Mike Hibler authored
in program-agent as well. Print filename in error message from event lib.
-
- 14 Aug, 2006 7 commits
-
-
Russ Fish authored
It only happens with old experiments with no archive yet. There's a missing code path getting the ns file in CopyInArchive(). Get the old experiment nsfile from the db in that case.
-
Leigh B. Stoller authored
-
Kevin Atkinson authored
Prep for Mike Kasick report code. Updated database schema and installed hooks for his code. Cleaned up how errors were handled in tblog(...). Allow SENDMAIL to be called before the path is untained in '-T' scripts. Other small changes.
-
Kevin Atkinson authored
-
Leigh B. Stoller authored
draft is that the user will at the end of an experiment run, log into one of his nodes and perform some analysis which is intended to be repeated at the end of the next run, and in future instantiations of the template. A new table called experiment_template_events holds the dynamic events for the template. Right now I am supporting just program events, but it will be easy to support arbitrary events later. As an absurd example: node6> /usr/local/bin/template_analyze ~/data_analyze arg arg ... The user is currently responsible for making sure the output goes into a file in the archive. I plan to make the template_analyze wrapper handle that automatically later, but for now what you really want is to invoke a script that encapsulates that, redirecting output to $ARCHIVE (this variable is installed in the environment template_analyze. The wrapper script will save the current time, and then run the program. If the program terminates with a zero exit status, it will ssh over to ops and invoke an xmlrpc routine to tell boss to add a program event to both the eventlist for the current instance, and to the template_eventlist for future instances. The time of the event is the relative start time that was saved above (remember, each experiment run replays the event stream from time zero). For the future, we want to allow this to be done on ops as well, but that will take more infrastructure, to run "program agents" on ops. It would be nice to install the ssl xmlrpc client side on our images so that we do not have to ssh to ops to invoke the client.
-
Mike Hibler authored
-
Leigh B. Stoller authored
agent to exit. rc.progagent now loops, restarting the program agent, but first getting new copies of the agent list and the environment from tmcd. Note that this conflicts slightly with the pa-wrapper used on plab nodes, which also loops. I think we can just get rid of pa-wrapper now, along with a slight change to rc.progagent. I'm gonna let Kirk comment on this. Need new images ...
-
- 11 Aug, 2006 12 commits
-
-
Mike Hibler authored
an issue with the program agent and signals (monitor is not getting killed during stop-experiment).
-
Mike Hibler authored
to slow plab nodes. First cut at a selective creation.
-
Mike Hibler authored
non-NULL latency/BW from those was too simplistic. With latency measurements much more frequent than BW measurements, we often never got a valid BW because we didn't go back far enough. So now just do two queries for the most recent non-NULL value of each. This could probably be done in a single query by joining the table with itself...
-
Mike Hibler authored
-
Mike Hibler authored
Turn off buffering for stdout.
-
Dan Gebhardt authored
-
Mike Hibler authored
-
Mike Hibler authored
it out to all images.
-
Mike Hibler authored
-
Jonathon Duerig authored
-
Kirk Webb authored
Fix up another place where the hostname lookup can fail, and thus cause the proxy (or anything else) to exit. Both evproxyplab and the event lib now first try to lookup the hostname to get the IP, and then fall back to grabbing the IP from /var/emulab/myip.
-
Mike Hibler authored
-