- 19 Oct, 2005 1 commit
-
-
Leigh B. Stoller authored
-
- 19 Jul, 2005 1 commit
-
-
Leigh B. Stoller authored
swapped out.
-
- 13 Jul, 2005 1 commit
-
-
Leigh B. Stoller authored
created, swapped in or modified when overquota. Ditto for creating images.
-
- 31 May, 2005 1 commit
-
-
Leigh B. Stoller authored
I fixed a couple of minor problems, but mostly this worked fine. Note that I have tested this with the installed perl, *NOT* perl 5.8. I am just making sure this stuff gets committed before too much more bitrot sets in.
-
- 20 May, 2005 1 commit
-
-
Leigh B. Stoller authored
record for each swapin (previously, it was just at swapmod). The reason for this is that as the testbed gets more fragmented in terms of hardware, it is less and less likely that consecutive swapins of the same experiment will use the same number of physical resources. We end up with some duplication of data inside the table, but no big deal. I suspect we will revisit this per experiment state as the workbench stuff proceeds.
-
- 12 May, 2005 2 commits
-
-
Leigh B. Stoller authored
or elabinelab experiment, but continue to allow only admins to do it if the experiment is active. Just while I continue to debug.
-
Leigh B. Stoller authored
Firewalled experiments (see tbsetup/elabinelab.in for the other stuff). * To support firewalled experiments, needed to add a new virt_firewalls table to split the existing firewalls table up, which included both virtual and physical stuff. There are the usual frontend changes and a few other things scattered around, including tmcd.c. * The firewall code in tbswap got some beefing up to support adding and deleting nodes from the its special control net vlan. Note that I have not made any progress on containment of deleted nodes, just as we do not do anything now for teardown (unless its paniced, in which case the experiment cannot be modified anyway). * ptopgen and assign_wrapper got some interesting modifications: Unlike regular swapmod, we cannot just tear down all the vlans since that would interrupt everything inside the inner elab. Instead, leave the vlans as is. The problem is that when assign runs, it can just as easily pick different interfaces on the same nodes, which would be a royal pain in the ass to deal with! So, ptopgen got a new option (-u) that assign wrapper uses to tell ptopgen that it should prune out unused interfaces from nodes that are already allocated to the experiment. This is, at best, as pathetically gross hack, but it makes sure that all the interfaces stay the same across swapmods. * The unrelated revision of elabinelab has a bunch of new code for adding and deleting nodes from the inner elab. Mostly it deals with dhcpd (inner and outer, waiting for nodes to reboot, etc). It also deals with updating the vlans table in the DB, pruning out any nodes (ports) that are deleted but for which there are still interfaces in existing vlans. Said ports are them moved back to the default vlan with calls to snmpit. Also under another revision a a couple of weeks ago are the web interface changes to support the newnode MFS inside an inner Emulab. * swapexp and endexp got some more checks for firewalled and paniced experiments, which were missing.
-
- 03 May, 2005 1 commit
-
-
Leigh B. Stoller authored
patch to avoid vlan corruption.
-
- 27 Apr, 2005 1 commit
-
-
Leigh B. Stoller authored
-
- 12 Jan, 2005 1 commit
-
-
Leigh B. Stoller authored
table that will prevent an experiment from being swapped/modified. The toggle is on the showexp page, and the toggle is *not* admin over-ridable; you must turn the toggle off (and of course, you must be an admin to do that).
-
- 16 Dec, 2004 1 commit
-
-
Leigh B. Stoller authored
* tbsetup/panic.in: New backend script to implement the panic button feature. When used, it will cut the severe the connection to the firewall node by using snmpit to disable the port. Sets the panic bit (and date) in the experiments table, and changes the state of the experiment from "active" to "paniced" to ensure that the experiment cannot be messed with (swapped out or modified). Sends email to tbops when the panic button is pressed. Used with -r option, reverses the above. State is set back to active, the panic bit is cleared, and the port is renabled with snmpit. * tbsetup/tbswap.in: During swapout, a firewalled experiment that has been paniced will get a cleaning; The nodes are powered off, then the osids for all the nodes are reset (with os_select) so that they will boot the MFS, and then the nodes are powered on. Then the control network is turned back on, and then I wait for the nodes to reboot (this is simply cause we do not record in the DB that a node is turned off, and if I do not wait, the reload daemon will end hitting the power button again if they do not reboot in time. We can fix this later. I am not planning to apply this to general firewalled experiments yet as the power cycling is going to be hard on the nodes, so would rather that we at least have a 1/2 baked plan before we do that. * www/showexp.php3: If experiment is firewalled, show the Panic Button, linked to the panic button web script. If the experiment has already had the panic button pressed, show a big warning message and explain that user must talk to tbops to swap the experiment out. Also fiddle with menu options so that the terminate link is gone, and the swap link is visible only in admin mode. In other words, only an admin person can swap an experiment once it is paniced. And of course, an admin person can the backend panic script above with the -r option, but thats not something to be done lightly. * db/libdb.pm.in: Add "paniced" as an experiment state (EXPTSTATE_PANICED). Add utility functions: TBExptSetPanicBit(), TBExptGetPanicBit(), and TBExptClearPanicBit(). * tbsetup/swapexp.in: Minor state fiddling so that an experiment can be swapped while in paniced state, but only when in admin mode. Also clear the panic bit when experiment is swapped out. * www/dbdefs.php3.in: Add "paniced" as an experiment state. Add a utility function TBExptFirewall() to see if experiment is firewalled. * www/panicbutton.php3: New web script to invoke the backend panic script mentioned above, after the usual confirm song and dance. * www/panicbutton.gif: New gif of a red panic button that I stole off the net. If anyone has sees/has a better one, feel free to replace this one. * utils/node_statewait.in: Add -s option so that I can pass in the state I want to wait for (used from tbswap above to wait for nodes to reach ISUP after power on).
-
- 15 Nov, 2004 1 commit
-
-
Leigh B. Stoller authored
* snmpit: When ElabInELabis true, use the routines in the new snmpit_remote.pm library for setting up and tearing down vlans for an experiment. At present, only these two operations are proxied out to the outer emulab. * snmpit_remote.pm: A new little library that uses the XMLPRC server on the outer emulab to setup and destroy vlans for an inner experiment. This code is used from snmpit (see above). * snmpit_lib.pm: A couple of minor changes for the server side of the proxy operation. * snmpit.proxy.in: A new perl module that is invoked from the RPC server. This proxy sets up and tears down vlans for an inner elab. The basic model is that the container experiment will have lots of vlans for various individual experiments running on the inner emulab. * swapexp: A couple of minor elabinelab hacks. * tbswap: For elabinelab experiments, reconfig/restart dhcpd when tearing down the experiment, and call out to new elabinelab script when setting up an elabinelab experiment. There is no provision for swapmod at this time. * elabinelab: A new script to create the inner emulab. Does all kinds of gross DB stuff then more gross stuff on the inner ops and boss.
-
- 30 Aug, 2004 1 commit
-
-
Leigh B. Stoller authored
path to the experiments logs directory (exp/$eid/logs/linktest.log).
-
- 29 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
* The first involves swapmod. When a swapmod on an active experiment fails, tbswap will reswap the experiment back to the original configuration. The problem is that it is reswapping it with the *new* virtual state of the experiment in the DB. It is not until later when control returns to swapexp that the virtual state is restored. This is plainly wrong, and in fact was causing the event scheduler grief cause it was starting up, reading the the virtual topo, which was different, wrong, and about to be blown away. I reorganized the modify section of swapexp so that virtual state is restored only when its a swapmod on a swapped experiment. On an active experiment, I moved that code down into tbswap, which will now does all of the virtual and physical state retore before it does the reswap back to the original experiment. Just for kicks, its also done if tbswap decides to swap the experiment cause of a fatal error. Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot deal with !$NoRecover. I know, two knots make a wright for most people. Renderer: I was annoyed by the fact that we rerun the renderer on a failed swapmod. The original reason is that the renderer runs in the background and so vis_nodes cannot be saved with the rest of the virtual state tables cause the renderer might still be running when the user fires off the swapmod. Well, the hell with that. We lock the vis_nodes table anyway in the renderer during update, so we are certain to get a consistent snapshot. We store the renderer pid in the experiments table, so if the renderer was running, just fire off another one; mostly this is not going to happen. In addition, tbprerun no longer starts a new renderer when doing the swapmod; I start the new renderer later after swapmod succeeds. I might end up tweaking this a bit depending on what people notice as being different. * Termination changes to batchexp and swapexp: I've rearranged the termination code using an END block so that any uncontrolled exit from either batchexp or swapexp will go through the cleanup code, and hopefully insert a stats record, as well as not leave the experiment in some inbetween state. I've set the max DB retry count to zero in both cases, which means infinite retry. I've also added SIGTERM handlers to both so that again, we can kill a hung batch/swap and have it clean up things more or less. Note that END blocks are not caught when a signal causes the program to die; you have to catch it and then die() so that the END block is executed. Eventually, we need to clean up the various libraries so that we do not use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure. Ditto for event system interface.
-
- 26 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
experiment is swapped or 2) the experiment is completely terminated. In these case, lets put explicit swapout/destroy events into testbed_stats so that the record is not confused by experiments that appear to start when they are still running. This really throws off the summary stats web page!
-
- 15 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
get sent properly; need to call TBdbfork(), and add a couple more event sends in libdb.
-
- 29 Jun, 2004 1 commit
-
-
Leigh B. Stoller authored
so that the process ID is tracked in the DB and so that the user can stop a linktest in progress from the web interface, even if its started directly from experiment swapin.
-
- 17 May, 2004 1 commit
-
-
Leigh B. Stoller authored
system as well as the summary page.
-
- 13 May, 2004 2 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
log, and to look at the summary page when the problem is not enough free nodes!
-
- 29 Apr, 2004 1 commit
-
-
Leigh B. Stoller authored
currently available to only people with stud=1 status in the DB. * www/tbauth.php3: Add a STUDLY() function to check that bit. * www/linktest.php3: New page to run linktest on the fly. The level defaults to the current level in the experiments table, but you can override that via the form on the page. * www/showexp.php3: Add link to aforementioned page. STUDLY() only. * www/beginexp_form.php3: Add an option (selection) to set the linktest level for create/swapin. Defaults to 0 (no linktest). STUDLY() only. * www/editexp.php3: Add an option to edit the default linktest level for an experiment. STUDLY() only. * tbsetup/batchexp.in and tbsetup/swapexp.in: Add code to optionally run the linktest, sending email if it fails (exists with non-zero status). Failure does not affect the swapin.
-
- 07 Apr, 2004 1 commit
-
-
Leigh B. Stoller authored
down when invoked from the RPC server.
-
- 15 Mar, 2004 1 commit
-
-
Leigh B. Stoller authored
these scripts!
-
- 12 Feb, 2004 1 commit
-
-
Leigh B. Stoller authored
no reason for the separation for a long time, and it made maintence more difficult cause of duplication between batchexp and startexp (batch was the sole user of startexp). Cleaner solution. * Check argument processing for batchexp, swapexp, endexp to make sure the taint checks are correct. All three of these scripts will now be available from ops. I especially watch the filename processing, which was pretty loose before and could allow some to grab a file on boss by trying to use it as an NS file (scripts all runs as user of course). The web interface generates filenames that are hard to guess, so rather then wrapping these scripts when invoked from ops, just allow the usual paths (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which should be hard enough to guess that users will not be able to get anything they are not supposed to. * Add -w (waitmode) options to all three scripts. In waitmode, the backend detaches, but the parent remains waiting for the child to finish so it can exit with the appropriate status (for scripting). The user can interrupt (^C), but it has no effect on the backend; it just kills the parent side that is waiting (backend is in a new session ID). Log outout still goes to the file (available from web page) and is emailed.
-
- 05 Feb, 2004 1 commit
-
-
Leigh B. Stoller authored
accessible.
-
- 08 Jan, 2004 1 commit
-
-
Shashi Guruprasad authored
added -eventsys_restart option to tbswap. These options are allowed only with swapexp -s modify and correspondingly for tbswap with update. Tested with a 1 node experiment and things seem to work fine.
-
- 18 Nov, 2003 4 commits
-
-
Leigh B. Stoller authored
* Make the NS file an optional argument to swapexp modify; when not given the prerun phase is skipped. Instead, go directly to tbswap (run assign, etc). * Add NSESWAP event so that Shashi can fire off the above modify using tevc from an experimental node. tevc -e pid/eid now ns nseswap * Change event scheduler to react to above event, and fire off: nseswap pid eid as the user. The script should do its thing, and *exec* swapexp with the proper args as quickly as possible (so that the event scheduler is not hung up for too long. The script is invoked as the user, since the event scheduler is running as the user.
-
Leigh B. Stoller authored
of virt_tables so that it is saved and restored like the rest of the virtual state.
-
Leigh B. Stoller authored
its going to get replaced at some point by a busy state. The swap scripts properly set the next state before unlocking the experiments table, which possibly leaves some small races as experiments transition through states (which happens with the table unlocked, cause I used to have this really handy variable called expt_locked, which no one really likes anymore). We either have to use more table locking, fix up expt_locked, or punt and say it won't happen more than once in a few thousand operations!
-
Leigh B. Stoller authored
instead of testbed-ops. Either way, Mike gets to see it.
-
- 17 Nov, 2003 1 commit
-
-
Leigh B. Stoller authored
state machine (state). All of the stuff that was previously handled by using batchstate is now embedded into the one state machine. Of course, these mostly overlapped, so its not that much of a change, except that we also redid the machine, adding more states (for example, modify phases are now explicit. To get a picture of the actual state machine, on boss: stategraph -o newstates EXPTSTATE gv newstates.ps Things to note: * The "batchstate" slot of the experiments table is now used solely to provide a lock for batch daemon. A secondary change will be to change the slot name to something more appropriate, but it can happen anytime after this new stuff is installed. * I have left expt_locked for now, but another later change will be to remove expt_locked, and change it to active_busy or some such new state name in the state machine. I have removed most uses of expt_locked, except those that were necessary until there is a new state to replace it. * These new changes are an implementation of the new state machine, but I have not done anything fancy. Most of the code is the same as it was before. * I suspect that there are races with the batch daemon now, but they are going to be rare, and the end result is probably that a cancelation is delayed a little bit.
-
- 29 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 16 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
swapped out (non-recoverable) by tbswap. swapexp was leaving the experiment in the running state instead of paused. We need to check this after tbswap since we do not get reasonable error codes back. Also some cleanup with respect to how aborted modifies are handled. I think I understand what Chad did ... A general comment; we need to be better about returning meaningful error codes!
-
- 30 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
plus a lock field. The lock field was a simple "experiment locked, go away" slot that is easy to use when you do not care about the actual state that an experiment is in, just that it is in "transition" and should not be messed with. The other two state variables are "state" and "batchstate". The former (state) is the original variable that Chris added, and was used by the tb* scripts to make sure that the experiment was in the state each particular script wanted them to be in. But over time (and with the addition of so much wrapper goo around them), "state" has leaked out all over the place to determine what operations on an experiment are allowed, and if/when it should be displayed in various web pages. There are a set of transition states in addition to the usual "active", "swapped", etc like "swapping" that make testing state a pain in the butt. I added the other state variable ("batchstate") when I did the batch system, obviously! It was intended as a wrapper state to control access to the batch queue, and to prevent batch experiments from being messed with except when it was really okay (for example, its okay to terminate a swapped out batch experiment, but not a swapped in batch experiment since that would confuse the batch daemon). There are fewer of these states, plus one additional state for "modifying" experiments. So what I have done is change the system to use "batchstate" for all experiments to control entry into the swap system, from the web interface, from the command line, and from the batch daemon. The other state variable still exists, and will be brutally pushed back under the surface until its just a vague memory, used only by the original tb* scripts. This will happen over time, and the "batchstate" variable will be renamed once I am convinced that this was the right thing to do and that my changes actually work as intended. Only people who have bothered to read this far will know that I also added the ability to cancel experiment swapin in progress. For that I am using the "canceled" flag (ah, this one was named properly from the start!), and I test that at various times in assign_wrapper and tbswap. A minor downside right now is that a canceled swapin looks too much like a failed swapin, and so tbops gets email about it. I'll fix that at some point (sometime after the boss complains). I also cleaned up various bits of code, replacing direct calls to exec with calls to the recently improved SUEXEC interface. This removes some cruft from each script that calls an external script. Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting. Also fixed to not run the parser directly! This was very wrong; should call nscheck instead. Changed to use "nobody" group instead of group flux (made the same change in nscheck). There is a script in the sql directory called newstates.pl. It needs to be run to initialize the batchstate slot of the experiments table for all existing experiments.
-
- 07 Aug, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 06 Aug, 2003 1 commit
-
-
Leigh B. Stoller authored
created in /tmp and left behind. I've moved them to the expwork directory instead, and added a routine in the library to clear them out. Clear out the nsfile (stored in /tmp) used in modify. The web page was creating a temp file, but never removing it. swapexp now copies the nsfile in so that the web page can remove the temporary after the script exits. The temp is placed in the expwork directory as well, but left behind for debugging. When swapmod fails, send along the nsfile in the email message.
-
- 30 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
not have to wait 3 minutes for it to finish before he can watch his experiment swapin fail for some other reason. I adopted the same pid mechanism as in eventsys_control.in, which uses a slot in the experiments table. Running "prerender" puts the render into the background and stores the pid. Running "prerender -r" kills a running prerender and removes the existing info from the DB. Fixed the problem with swapmod not restoring the old vis; swapmod now kills any running prerender, and restarts one if the swapmod fails (the prerun of the new NS file starts up another prerender in the background). Add setpriority() call in prerender to nice it and children to 15.
-
- 29 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
showexp page that its a batch experiment, by the menu options. Same deal in the swapexp output, plus some other minor cleanup. The only bug I found while trying to figure out the batchmode problem reported this morning by the FileMover people, is that the cancelflag is not cleared after swaping a running batch experiment out, so even after reinjecting it into the queue, it will not run. Still, that does seem to be what the FileMover people reported.
-
- 27 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
final time, so that we can see how long things take. As per Jay's request.
-
- 17 Jul, 2003 1 commit
-
-
Mac Newbold authored
Fix up email message text, hide swappable bit, and ignore being in transition on a forced swap. When I say force, I mean it, dang it!
-