- 18 Dec, 2009 1 commit
-
-
Leigh B. Stoller authored
What I did was create node table entries for the three SPP nodes. These are designated as local, shared nodes, reserved to a holding experiment. This allowed me to use all of the existing shared node pool support, albeit with a couple of tweaks in libvtop that I will not bother to mention since they are hideous (another thing I need to fix). The virtual nodes that are created on the spp nodes are figments; they will never be setup, booted or torn down. They exist simply as place holders in the DB, in order hold the reserved bandwidth on the network interfaces. In other words, you can create as many of these imaginary spp nodes (in different slices if you like) as there are interfaces on the spp node. Or you can create a single spp imaginary node with all of the interfaces. You get the idea; its the reserved bandwidth that drives the allocation. There are also some minor spp specific changes in vnode_setup.in to avoid trying to generalize things. I will return to this later as needed. See this wiki page for info and sample rspecs: https://www.protogeni.net/trac/protogeni/wiki/SPPNodes
-
- 24 Sep, 2009 1 commit
-
-
Leigh B. Stoller authored
want to do.
-
- 19 Aug, 2009 1 commit
-
-
Jonathon Duerig authored
Up the timeout for waiting on a child node to 3000 seconds to accomodate estimated time for OpenVZ w/ 100 containers.
-
- 17 Aug, 2009 1 commit
-
-
Jonathon Duerig authored
-
- 11 Jun, 2009 1 commit
-
-
Leigh B. Stoller authored
nodes during setup/teardown of experiments.
-
- 08 Dec, 2008 1 commit
-
-
Leigh B. Stoller authored
-
- 02 Dec, 2008 1 commit
-
-
Leigh B. Stoller authored
-
- 08 Sep, 2008 2 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
geni code ready for primetime.
-
- 03 Sep, 2008 1 commit
-
-
Leigh B. Stoller authored
via the API instead of ssh. Did some cleanup (more conversion to objects) while I was in there.
-
- 06 Feb, 2008 1 commit
-
-
David Johnson authored
now, this is keyed off nodetype. Lots of hardcoded constants and config stuff moved to attributes in the db. You can now set per-PLC and per-slice attributes, so you can (for instance) use different auth info whenever you want. Experiments can use preexisting slices if somebody sets up the db before swapin. Also, we no longer have to rely on slices.xml to sync up nodes/sites with PLC... can use xmlrpc instead. Lots of code cleanup, improved some abstractions, etc.
-
- 29 Aug, 2007 2 commits
-
-
David Johnson authored
-
David Johnson authored
setting up. First chunk: we don't want to skip straight to reboot if the node is plab. Second chunk: we need to run both plabnode alloc and remote vnodesetup on plab nodes. Third chunk: we have to ssh to the vnode, not the pnode, on plab nodes, to make sure we make it to the correct sliver. Not sure how this last affects jails, so I special-cased it for plab.
-
- 16 Aug, 2007 1 commit
-
-
Leigh B. Stoller authored
plabslice, but still sorta behave like one. Also put back some code that must have been removed at some point, to initiate vnode setup on remote nodes (since we do not reboot widearea nodes; they are are always allocated and up).
-
- 18 Dec, 2006 1 commit
-
-
David Johnson authored
happening was that when Kevin swapmod'd to get rid of failed nodes, he just took the bad ones out. This forced a change in the vname<->vnode mapping, and the failed node got put in a state (RES_INIT_CLEAN) that vnode_setup couldn't handle for plab nodes. Basically, the problem is that vnode_setup was assuming that the RES_INIT_CLEAN meant that the plab vnode needed to be allocated -- but it was already allocated in the previous swap.
-
- 11 Oct, 2006 1 commit
-
-
Kirk Webb authored
Change the way vnode_setup handles plab nodes a bit to avoid a couple of buggy situations. * Don't try vnodesetup -h on plab nodes This can hang, or even fail. Since nothing useful is conveyed by this step, just skip it, set the node's state to SHUTDOWN, and ask pl_conf on the node to remove the vserver. * Set plab node's alloc state to TBDB_ALLOCSTATE_RES_INIT_DIRTY after instantiation. This avoids a bug where Emulab cluster nodes fail to come up, and so os_setup never waits on the plab vnodes (now that they are started in parallel with physical node setup). Previously their alloc state made them look clean, and so the vservers would not be reaped during teardown.
-
- 08 Sep, 2006 1 commit
-
-
Kirk Webb authored
Parallelize the setup of plab vnodes alongside the loading of local physical nodes. We fork vnode_setup to operate on the plab vnodes just before firing off local reload/reboot/reconfig operations. The status of the plab vnode setup setup is checked just before firing off vnode_setup for any local vnodes. The ISUP wait for plab vnodes continues to fall within the same stage as wating for local vnodes. New arguments have been added to vnode_setup to tell it to only operate on specific vnode types. '-j' for local jail nodes, and '-p' for plab nodes. If neither are specified, the default is to operate on all types.
-
- 13 Jun, 2006 2 commits
- 22 Jan, 2006 1 commit
-
-
Kirk Webb authored
libplab.py.in: Exit right away when signalled while trying to perform a remote command. vnode_setup.in: More info when a timeout occurs, and reduce the execution spacing a little.
-
- 21 Dec, 2005 1 commit
-
-
Kirk Webb authored
Add a bit of additional output info, and fix a little bug in rc.inplab
-
- 20 May, 2004 1 commit
-
-
Leigh B. Stoller authored
-
- 19 Mar, 2004 1 commit
-
-
Leigh B. Stoller authored
the allocstate (which was set earlier in tbswap). Note that we do not reconfig vnodes, even local jails. We just reboot them instead since its so fast to reboot. Might need to revisit that at some point.
-
- 17 Mar, 2004 2 commits
-
-
Kirk Webb authored
* created "-w" vnode_setup option that specifies how long to wait (per-vnode) for setup to complete before giving up. * added sitevars for plab batch parallelism size and vnode setup timeout * modified os_setup to use above sitevars when invoking vnode_setup for an experiment containing plab vnodes.
-
Kirk Webb authored
* Changed the way options are parsed in the python scripts so that modules can easily add and use their own options independent of top-level scripts. * Added --noIS and --pollNodes module options. * Added batch option to vnode_setup (degree of parallelization) - defaults to 10 * Major updates to plamonitord - batches testing, currently to 40
-
- 11 Mar, 2004 1 commit
-
-
Jay Lepreau authored
-
- 26 Feb, 2004 1 commit
-
-
Kirk Webb authored
remove debugging flag from libdb.py
-
- 03 Jan, 2004 1 commit
-
-
Kirk Webb authored
* use IP addr rather than finickey hostname when communicating with PLC. * make Node._create() aware of "already assigned" condition. * Bump vnode_setup timeout back to two minutes (for now).
-
- 30 Dec, 2003 1 commit
-
-
Kirk Webb authored
vnode_setup for the timeout on waiting for child processes. I've set it to 10 minutes since all ancillary setup programs have their own time bounds (I think - the plab ones do anyway). The function of plabmonitord has changed slightly. Instead of setting up and tearing down vnodes, its job is to just setup the emulab management sliver on plab nodes in hwdown. Once the vserver comes up and reports isalive, it moves the node out of hwdown. Currently, it first tries to tear down the vserver before reinstantiating it. In the future, we could get fancier and try interacting with the service sliver directly before simply tearing it down. All new plab nodes now start life in hwdown, and must be summoned forth into production by plabmonitord. This commit does NOT include support for the node-local httpd. That will come soon.
-
- 12 Dec, 2003 1 commit
-
-
Robert Ricci authored
value under some circumstances, which would result in not actually timing out children that were taking too long.
-
- 17 Nov, 2003 1 commit
-
-
Leigh B. Stoller authored
state machine (state). All of the stuff that was previously handled by using batchstate is now embedded into the one state machine. Of course, these mostly overlapped, so its not that much of a change, except that we also redid the machine, adding more states (for example, modify phases are now explicit. To get a picture of the actual state machine, on boss: stategraph -o newstates EXPTSTATE gv newstates.ps Things to note: * The "batchstate" slot of the experiments table is now used solely to provide a lock for batch daemon. A secondary change will be to change the slot name to something more appropriate, but it can happen anytime after this new stuff is installed. * I have left expt_locked for now, but another later change will be to remove expt_locked, and change it to active_busy or some such new state name in the state machine. I have removed most uses of expt_locked, except those that were necessary until there is a new state to replace it. * These new changes are an implementation of the new state machine, but I have not done anything fancy. Most of the code is the same as it was before. * I suspect that there are races with the batch daemon now, but they are going to be rare, and the end result is probably that a cancelation is delayed a little bit.
-
- 23 Oct, 2003 1 commit
-
-
Kirk Webb authored
Well, here it is: The checkin implementing robust recovery/retry and asynchronous safe termination in plab allocation/deallocation/setup. Here are some of the more prominent changes/additions: * Bounded plab agent communication Scripts should never hang waiting for plab xmlrpc commands to complete; they have their own internal timeouts. Node.create() in libplab is an exception, but is always run under a timeout constraint in vnode_setup and can be changed easily if the need arises. * Wrote functions in libplab to do the retry/recovery/timeout of remote command exection. * Wrapped critical sections with a signal watcher. * Added code to handle various error conditions properly * Added a libtestbed function, TBForkCmd, which runs a given program in a child process, and can optionally catch incoming SIGTERMs and terminate the child (then exit itself). * Fixed up vnode_setup to batch the 'plabnode free' operation along with a few other cleanups. This should alleviate Jay's concern about how long it used to take to teardown a plab expt. * Whacked plabmonitord into better shape; fixed a couple bugs, taught it how to daemonize, and implemented a priority list for testing broken plab nodes. This list causes new (as yet unseen) nodes to be tried first over ones that have been tested already.
-
- 30 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
awaiting word from Kirk.
-
- 24 Sep, 2003 1 commit
-
-
Robert Ricci authored
actually finding the youngest. Luckily, it was not causing timeouts that were too short, only timeouts that were too long.
-
- 23 Sep, 2003 3 commits
-
-
Kirk Webb authored
causing problems. Will investigate tomorrow.
-
Kirk Webb authored
finds that the pid returned from wait() doesn't match the one returned from fork() earlier - this shouldn't happen, but it is. I am checking for errors - parhaps I'm missing something though. This affects plabnode free in vnode_setup since it vnode_setup doesn't fork when it runs this.
-
Kirk Webb authored
Updated vnode_setup to fork+exec plabnode (alloc|free) rather than invoking it with system(). Now when the parent receives a SIGTERM from its parent (the top-level vnode_setup), it will kill off it's plabnode child process before exiting itself. invocation of plabnode is now done via the plabnode() function. Needs some commenting. Tested thoroughly.
-
- 22 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
fail in "plabnode alloc" or in the remote vnodesetup call. In the former case, we do not want to "plabnode free" it later. In the later, we want to plabnode free it right away, and make sure we do not try to remote vnode teardown or plabfree it later. In either case, os_setup needs to check so that it does not bother waiting for the node since it is wasted time. I use an alternate dead state for this, but the real solution is to move much of the vnode specific code from os_setup to vnode_setup. Note that this stuff is mostly untested since I need nodes to fail! The normal path works fine though.
-
- 18 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
in general, and plab nodes in particular.
-
- 17 Sep, 2003 1 commit
-
-
Robert Ricci authored
-