tbsetup/libtestbed.pm.in · 9049532a0703a096b9358d25b724a63e391ce982 · emulab / emulab-devel

· 5b52831c
Kirk Webb authored Oct 23, 2003
Well, here it is:  The checkin implementing robust recovery/retry and
asynchronous safe termination in plab allocation/deallocation/setup.

Here are some of the more prominent changes/additions:

* Bounded plab agent communication
  Scripts should never hang waiting for plab xmlrpc commands to complete;
  they have their own internal timeouts.  Node.create() in libplab is an
  exception, but is always run under a timeout constraint in vnode_setup
  and can be changed easily if the need arises.

* Wrote functions in libplab to do the retry/recovery/timeout of remote
  command exection.

* Wrapped critical sections with a signal watcher.

* Added code to handle various error conditions properly

* Added a libtestbed function, TBForkCmd, which runs a given program in
  a child process, and can optionally catch incoming SIGTERMs and terminate
  the child (then exit itself).

* Fixed up vnode_setup to batch the 'plabnode free' operation along with
  a few other cleanups.  This should alleviate Jay's concern about how
  long it used to take to teardown a plab expt.

* Whacked plabmonitord into better shape; fixed a couple bugs, taught it how
  to daemonize, and implemented a priority list for testing broken plab nodes.
  This list causes new (as yet unseen) nodes to be tried first over ones that
  have been tested already.
5b52831c