- 02 Oct, 2003 4 commits
-
-
Leigh B. Stoller authored
someday, but not today!
-
Leigh B. Stoller authored
fixed fix before declaring the fix was fixed. Note that I haven't tested this fix either.
-
Robert Ricci authored
-
Leigh B. Stoller authored
-
- 01 Oct, 2003 2 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
- 30 Sep, 2003 5 commits
-
-
Leigh B. Stoller authored
the batch system see's them as always done. There is no reason to do this from the node itself, since it would be really hard to have either a jail or delay node without other nodes in the topology!
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
awaiting word from Kirk.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
plus a lock field. The lock field was a simple "experiment locked, go away" slot that is easy to use when you do not care about the actual state that an experiment is in, just that it is in "transition" and should not be messed with. The other two state variables are "state" and "batchstate". The former (state) is the original variable that Chris added, and was used by the tb* scripts to make sure that the experiment was in the state each particular script wanted them to be in. But over time (and with the addition of so much wrapper goo around them), "state" has leaked out all over the place to determine what operations on an experiment are allowed, and if/when it should be displayed in various web pages. There are a set of transition states in addition to the usual "active", "swapped", etc like "swapping" that make testing state a pain in the butt. I added the other state variable ("batchstate") when I did the batch system, obviously! It was intended as a wrapper state to control access to the batch queue, and to prevent batch experiments from being messed with except when it was really okay (for example, its okay to terminate a swapped out batch experiment, but not a swapped in batch experiment since that would confuse the batch daemon). There are fewer of these states, plus one additional state for "modifying" experiments. So what I have done is change the system to use "batchstate" for all experiments to control entry into the swap system, from the web interface, from the command line, and from the batch daemon. The other state variable still exists, and will be brutally pushed back under the surface until its just a vague memory, used only by the original tb* scripts. This will happen over time, and the "batchstate" variable will be renamed once I am convinced that this was the right thing to do and that my changes actually work as intended. Only people who have bothered to read this far will know that I also added the ability to cancel experiment swapin in progress. For that I am using the "canceled" flag (ah, this one was named properly from the start!), and I test that at various times in assign_wrapper and tbswap. A minor downside right now is that a canceled swapin looks too much like a failed swapin, and so tbops gets email about it. I'll fix that at some point (sometime after the boss complains). I also cleaned up various bits of code, replacing direct calls to exec with calls to the recently improved SUEXEC interface. This removes some cruft from each script that calls an external script. Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting. Also fixed to not run the parser directly! This was very wrong; should call nscheck instead. Changed to use "nobody" group instead of group flux (made the same change in nscheck). There is a script in the sql directory called newstates.pl. It needs to be run to initialize the batchstate slot of the experiments table for all existing experiments.
-
- 29 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
one from my devel tree).
-
- 26 Sep, 2003 3 commits
-
-
Robert Ricci authored
desire for 'hosts-<type>', where <type> is the type of its child node. This helps assign, because it can now limit the number of places to try assigning the host, and it means that we can give the hosts this feature, so that they don't get used for other purposes. For example, we can give the IXP-hosting nodes the feature 'hosts-ixp-bv' with weight 1, and they will never get used for anything but IXP hosting. This means that the node_type_features (or just node_features) table must now have hosts-<type> entires to work correctly.
-
Robert Ricci authored
put: $node add-desire desire weight This will end up in the virt_node_desires table. assign_wrapper now puts the desires from this table into the top file.
-
Leigh B. Stoller authored
from my devel tree).
-
- 25 Sep, 2003 4 commits
-
-
Leigh B. Stoller authored
-
Robert Ricci authored
does the daemon.
-
Kirk Webb authored
is going out.
-
Leigh B. Stoller authored
-
- 24 Sep, 2003 4 commits
-
-
Leigh B. Stoller authored
trying to bring them back from the dead periodically by trying to instantiate a vserver/vnode on them, and then tearing it down. If we can do that, then the node is usable, and it gets moved back into the normal holding experiment so that ptopgen will add it to ptop files. This deamon is not turned on yet; waiting for other little bits and pieces to be done. There is an equiv change in os_setup that moves physnodes into hwdown when a setup on a vnode fails. Lbs
-
Robert Ricci authored
assign can attempt to spread an experimenter's nodes across sites.
-
Leigh B. Stoller authored
tmcd (which is bad, since tying up the tmcd threads blocks all nodes in the testbed). The old functionality is left in tmcd for now. On the server side, a new web page (www/spewrpmtar.php3) receives a request for a file, along with the nodeid (pcXXX) making the request, and the secret key that is generated for each new experiment and transfered to the node via tmcd. If the key matches, the operation is handed off to tbsetup/spewrpmtar.in which verifies that the file is in the list of rpm/tar files for that node, and then spits it out to stdout. The web page uses fpassthru() to send the file out to the client. The client is using wget, and is required to use https (the web page checks). At present, the external script is run as the creator of the experiment, and gid of the experiment. Perhaps this is not a good idea. In any event, the file must be in the list of rpm/tarfiles, either owned by the experiment creator or with a group of the experiment, and the file must reside in either /proj or /groups. I use the realpath() function to make sure there are no symlink tricks pointing to outside those filesystems. I use the standard NFS read goo to prevent transient mount problems that we all know and love.
-
Robert Ricci authored
actually finding the youngest. Luckily, it was not causing timeouts that were too short, only timeouts that were too long.
-
- 23 Sep, 2003 8 commits
-
-
Kirk Webb authored
causing problems. Will investigate tomorrow.
-
Kirk Webb authored
finds that the pid returned from wait() doesn't match the one returned from fork() earlier - this shouldn't happen, but it is. I am checking for errors - parhaps I'm missing something though. This affects plabnode free in vnode_setup since it vnode_setup doesn't fork when it runs this.
-
Kirk Webb authored
Updated vnode_setup to fork+exec plabnode (alloc|free) rather than invoking it with system(). Now when the parent receives a SIGTERM from its parent (the top-level vnode_setup), it will kill off it's plabnode child process before exiting itself. invocation of plabnode is now done via the plabnode() function. Needs some commenting. Tested thoroughly.
-
Kirk Webb authored
A couple of quick bug fixes - extract/format traceback properly for email message - libplab.py is already disabling lin buffering for plabnode, so the code here to disable it has been removed. We were led to believe there was a buffering problem from the plabnode scripts that were never actually getting killed off.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Kirk Webb authored
unbuffered.
-
Kirk Webb authored
imported lib fixes output ordering.
-
- 22 Sep, 2003 5 commits
-
-
Kirk Webb authored
The last vnode created when a new plab node is entered into the DB is now allocated to the special "plab-monitor" experiment where it will be used to check the physnode's integrity.
-
Robert Ricci authored
2) Print the nodes' site, rather than our pname, in the 'hostnames' section, with a summary of how many unique sites there were at the top.
-
Mike Hibler authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
fail in "plabnode alloc" or in the remote vnodesetup call. In the former case, we do not want to "plabnode free" it later. In the later, we want to plabnode free it right away, and make sure we do not try to remote vnode teardown or plabfree it later. In either case, os_setup needs to check so that it does not bother waiting for the node since it is wasted time. I use an alternate dead state for this, but the real solution is to move much of the vnode specific code from os_setup to vnode_setup. Note that this stuff is mostly untested since I need nodes to fail! The normal path works fine though.
-
- 19 Sep, 2003 4 commits
-
-
Robert Ricci authored
-
Robert Ricci authored
with the 'down:1' feature, so that they won't normally get allocated.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-