- 23 Jan, 2006 1 commit
-
-
Leigh B. Stoller authored
is putting it (into the archive).
-
- 28 Dec, 2005 1 commit
-
-
Leigh B. Stoller authored
* The rest of the backend support for simplistic experiment duplication, either from an existing (current) experiment, or from a specific archive revision of a current or terminated experiment. * Rework the swapmod code so that the archive is committed at the exact end of the swapout phase. This required adding (and moving) some code from swapexp to tbswap sine that is where the actual swapout/swapin happens during a swapmod * Add a special directory called "archive" to the experiment directory, which is a place where users can store stuff they want saved away. This will eventually be a user defined set of directories, but this was good for getting the basic mechanism in place. Note that the when the contents of this directory are copied out for placement into the archive, it is an exact copy made with rsync. * No longer "clean" the contents of the temporary store between commits of the archive. This was creating a lot of headaches, and was also causing the revision history to get messed up. The downside of this is that we have to be more careful to explicitly delete files that the user no longer uses. I have not solved all these issues yet, so in the meantime files will get left in the archive even if the user no longer references them.
-
- 19 Dec, 2005 1 commit
-
-
Kevin Atkinson authored
Updates to to Error Logging API Code. You should start seeing much better error messages coming from my system. Errors coming from parse.proxy and assign (the two most frequent sources of errors) should now be concise and to the point. Errors coming from libosload/libreboot (the next most frequent source of errors) should now also be much better, but not perfect. Getting perfect errors will likely a rework of how errors are handled in libosload/libreboot, just adding tberror/tbwarn/tbnotice calls is not enough. I can do this at a latter date if necessary. A few minor database changes. Some changes to the API. A few bug fixes. Lots of tberror/tbwarn/tbnotice added to scripts. Since assign is a C program, and at this time my API is perl only, I wrote a second wrapper around assign, assign_wrapper2. When assign fails errors are now parsed in assign_wrapper2, sent to stderr and logged. This means that RunAssign() just returns when assign fails rather than echoing some of assign.log output and then quiting. The output to the activity log remains unchanged. Since "parse.proxy" is run from ops I couldn't use my API in it, even though it is a perl program. Instead I parse the errors coming form it in parse-ns.
-
- 15 Dec, 2005 2 commits
-
-
Timothy Stack authored
files are being added to the archive at the correct places, but it seems to work.
-
Leigh B. Stoller authored
studly users in the testbed project on the mainsite.
-
- 12 Dec, 2005 1 commit
-
-
Leigh B. Stoller authored
causing the program agent to run as last swapper. This was a little tricky cause of a poor decision to share the usage of the last_swap_uid in the stats gathering code, which wants the to set the last swapper late so that the previous swapper gets charged appropriately. Rather then mess with the stats code too much, I moved things around a bit, setting the swapper earlier and adding code in libdb to capture the original swapper at the begining of a swapmod for accounting, and then adding code in swapexp to reset the swapper if a swapmod fails. Should fix the stats code at some point to have its own idea of swapper.
-
- 04 Nov, 2005 1 commit
-
-
Kevin Atkinson authored
Added error logging API. See tbsetup/libtblog.pm.in and tbsetup/libtblog.sql.
-
- 19 Oct, 2005 1 commit
-
-
Leigh B. Stoller authored
-
- 19 Jul, 2005 1 commit
-
-
Leigh B. Stoller authored
swapped out.
-
- 13 Jul, 2005 1 commit
-
-
Leigh B. Stoller authored
created, swapped in or modified when overquota. Ditto for creating images.
-
- 31 May, 2005 1 commit
-
-
Leigh B. Stoller authored
I fixed a couple of minor problems, but mostly this worked fine. Note that I have tested this with the installed perl, *NOT* perl 5.8. I am just making sure this stuff gets committed before too much more bitrot sets in.
-
- 20 May, 2005 1 commit
-
-
Leigh B. Stoller authored
record for each swapin (previously, it was just at swapmod). The reason for this is that as the testbed gets more fragmented in terms of hardware, it is less and less likely that consecutive swapins of the same experiment will use the same number of physical resources. We end up with some duplication of data inside the table, but no big deal. I suspect we will revisit this per experiment state as the workbench stuff proceeds.
-
- 12 May, 2005 2 commits
-
-
Leigh B. Stoller authored
or elabinelab experiment, but continue to allow only admins to do it if the experiment is active. Just while I continue to debug.
-
Leigh B. Stoller authored
Firewalled experiments (see tbsetup/elabinelab.in for the other stuff). * To support firewalled experiments, needed to add a new virt_firewalls table to split the existing firewalls table up, which included both virtual and physical stuff. There are the usual frontend changes and a few other things scattered around, including tmcd.c. * The firewall code in tbswap got some beefing up to support adding and deleting nodes from the its special control net vlan. Note that I have not made any progress on containment of deleted nodes, just as we do not do anything now for teardown (unless its paniced, in which case the experiment cannot be modified anyway). * ptopgen and assign_wrapper got some interesting modifications: Unlike regular swapmod, we cannot just tear down all the vlans since that would interrupt everything inside the inner elab. Instead, leave the vlans as is. The problem is that when assign runs, it can just as easily pick different interfaces on the same nodes, which would be a royal pain in the ass to deal with! So, ptopgen got a new option (-u) that assign wrapper uses to tell ptopgen that it should prune out unused interfaces from nodes that are already allocated to the experiment. This is, at best, as pathetically gross hack, but it makes sure that all the interfaces stay the same across swapmods. * The unrelated revision of elabinelab has a bunch of new code for adding and deleting nodes from the inner elab. Mostly it deals with dhcpd (inner and outer, waiting for nodes to reboot, etc). It also deals with updating the vlans table in the DB, pruning out any nodes (ports) that are deleted but for which there are still interfaces in existing vlans. Said ports are them moved back to the default vlan with calls to snmpit. Also under another revision a a couple of weeks ago are the web interface changes to support the newnode MFS inside an inner Emulab. * swapexp and endexp got some more checks for firewalled and paniced experiments, which were missing.
-
- 03 May, 2005 1 commit
-
-
Leigh B. Stoller authored
patch to avoid vlan corruption.
-
- 27 Apr, 2005 1 commit
-
-
Leigh B. Stoller authored
-
- 12 Jan, 2005 1 commit
-
-
Leigh B. Stoller authored
table that will prevent an experiment from being swapped/modified. The toggle is on the showexp page, and the toggle is *not* admin over-ridable; you must turn the toggle off (and of course, you must be an admin to do that).
-
- 16 Dec, 2004 1 commit
-
-
Leigh B. Stoller authored
* tbsetup/panic.in: New backend script to implement the panic button feature. When used, it will cut the severe the connection to the firewall node by using snmpit to disable the port. Sets the panic bit (and date) in the experiments table, and changes the state of the experiment from "active" to "paniced" to ensure that the experiment cannot be messed with (swapped out or modified). Sends email to tbops when the panic button is pressed. Used with -r option, reverses the above. State is set back to active, the panic bit is cleared, and the port is renabled with snmpit. * tbsetup/tbswap.in: During swapout, a firewalled experiment that has been paniced will get a cleaning; The nodes are powered off, then the osids for all the nodes are reset (with os_select) so that they will boot the MFS, and then the nodes are powered on. Then the control network is turned back on, and then I wait for the nodes to reboot (this is simply cause we do not record in the DB th...
-
- 15 Nov, 2004 1 commit
-
-
Leigh B. Stoller authored
* snmpit: When ElabInELabis true, use the routines in the new snmpit_remote.pm library for setting up and tearing down vlans for an experiment. At present, only these two operations are proxied out to the outer emulab. * snmpit_remote.pm: A new little library that uses the XMLPRC server on the outer emulab to setup and destroy vlans for an inner experiment. This code is used from snmpit (see above). * snmpit_lib.pm: A couple of minor changes for the server side of the proxy operation. * snmpit.proxy.in: A new perl module that is invoked from the RPC server. This proxy sets up and tears down vlans for an inner elab. The basic model is that the container experiment will have lots of vlans for various individual experiments running on the inner emulab. * swapexp: A couple of minor elabinelab hacks. * tbswap: For elabinelab experiments, reconfig/restart dhcpd when tearing down the experiment, and call out to new elabinelab script when setting up an elabinelab experiment. There is no provision for swapmod at this time. * elabinelab: A new script to create the inner emulab. Does all kinds of gross DB stuff then more gross stuff on the inner ops and boss.
-
- 30 Aug, 2004 1 commit
-
-
Leigh B. Stoller authored
path to the experiments logs directory (exp/$eid/logs/linktest.log).
-
- 29 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
* The first involves swapmod. When a swapmod on an active experiment fails, tbswap will reswap the experiment back to the original configuration. The problem is that it is reswapping it with the *new* virtual state of the experiment in the DB. It is not until later when control returns to swapexp that the virtual state is restored. This is plainly wrong, and in fact was causing the event scheduler grief cause it was starting up, reading the the virtual topo, which was different, wrong, and about to be blown away. I reorganized the modify section of swapexp so that virtual state is restored only when its a swapmod on a swapped experiment. On an active experiment, I moved that code down into tbswap, which will now does all of the virtual and physical state retore before it does the reswap back to the original experiment. Just for kicks, its also done if tbswap decides to swap the experiment cause of a fatal error. Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot deal with !$NoRecover. I know, two knots make a wright for most people. Renderer: I was annoyed by the fact that we rerun the renderer on a failed swapmod. The original reason is that the renderer runs in the background and so vis_nodes cannot be saved with the rest of the virtual state tables cause the renderer might still be running when the user fires off the swapmod. Well, the hell with that. We lock the vis_nodes table anyway in the renderer during update, so we are certain to get a consistent snapshot. We store the renderer pid in the experiments table, so if the renderer was running, just fire off another one; mostly this is not going to happen. In addition, tbprerun no longer starts a new renderer when doing the swapmod; I start the new renderer later after swapmod succeeds. I might end up tweaking this a bit depending on what people notice as being different. * Termination changes to batchexp and swapexp: I've rearranged the termination code using an END block so that any uncontrolled exit from either batchexp or swapexp will go through the cleanup code, and hopefully insert a stats record, as well as not leave the experiment in some inbetween state. I've set the max DB retry count to zero in both cases, which means infinite retry. I've also added SIGTERM handlers to both so that again, we can kill a hung batch/swap and have it clean up things more or less. Note that END blocks are not caught when a signal causes the program to die; you have to catch it and then die() so that the END block is executed. Eventually, we need to clean up the various libraries so that we do not use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure. Ditto for event system interface.
-
- 26 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
experiment is swapped or 2) the experiment is completely terminated. In these case, lets put explicit swapout/destroy events into testbed_stats so that the record is not confused by experiments that appear to start when they are still running. This really throws off the summary stats web page!
-
- 15 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
get sent properly; need to call TBdbfork(), and add a couple more event sends in libdb.
-
- 29 Jun, 2004 1 commit
-
-
Leigh B. Stoller authored
so that the process ID is tracked in the DB and so that the user can stop a linktest in progress from the web interface, even if its started directly from experiment swapin.
-
- 17 May, 2004 1 commit
-
-
Leigh B. Stoller authored
system as well as the summary page.
-
- 13 May, 2004 2 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
log, and to look at the summary page when the problem is not enough free nodes!
-
- 29 Apr, 2004 1 commit
-
-
Leigh B. Stoller authored
currently available to only people with stud=1 status in the DB. * www/tbauth.php3: Add a STUDLY() function to check that bit. * www/linktest.php3: New page to run linktest on the fly. The level defaults to the current level in the experiments table, but you can override that via the form on the page. * www/showexp.php3: Add link to aforementioned page. STUDLY() only. * www/beginexp_form.php3: Add an option (selection) to set the linktest level for create/swapin. Defaults to 0 (no linktest). STUDLY() only. * www/editexp.php3: Add an option to edit the default linktest level for an experiment. STUDLY() only. * tbsetup/batchexp.in and tbsetup/swapexp.in: Add code to optionally run the linktest, sending email if it fails (exists with non-zero status). Failure does not affect the swapin.
-
- 07 Apr, 2004 1 commit
-
-
Leigh B. Stoller authored
down when invoked from the RPC server.
-
- 15 Mar, 2004 1 commit
-
-
Leigh B. Stoller authored
these scripts!
-
- 12 Feb, 2004 1 commit
-
-
Leigh B. Stoller authored
no reason for the separation for a long time, and it made maintence more difficult cause of duplication between batchexp and startexp (batch was the sole user of startexp). Cleaner solution. * Check argument processing for batchexp, swapexp, endexp to make sure the taint checks are correct. All three of these scripts will now be available from ops. I especially watch the filename processing, which was pretty loose before and could allow some to grab a file on boss by trying to use it as an NS file (scripts all runs as user of course). The web interface generates filenames that are hard to guess, so rather then wrapping these scripts when invoked from ops, just allow the usual paths (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which should be hard enough to guess that users will not be able to get anything they are not supposed to. * Add -w (waitmode) options to all three scripts. In waitmode, the backend detaches, but the parent remains waiting for the child to finish so it can exit with the appropriate status (for scripting). The user can interrupt (^C), but it has no effect on the backend; it just kills the parent side that is waiting (backend is in a new session ID). Log outout still goes to the file (available from web page) and is emailed.
-
- 05 Feb, 2004 1 commit
-
-
Leigh B. Stoller authored
accessible.
-
- 08 Jan, 2004 1 commit
-
-
Shashi Guruprasad authored
added -eventsys_restart option to tbswap. These options are allowed only with swapexp -s modify and correspondingly for tbswap with update. Tested with a 1 node experiment and things seem to work fine.
-
- 18 Nov, 2003 4 commits
-
-
Leigh B. Stoller authored
* Make the NS file an optional argument to swapexp modify; when not given the prerun phase is skipped. Instead, go directly to tbswap (run assign, etc). * Add NSESWAP event so that Shashi can fire off the above modify using tevc from an experimental node. tevc -e pid/eid now ns nseswap * Change event scheduler to react to above event, and fire off: nseswap pid eid as the user. The script should do its thing, and *exec* swapexp with the proper args as quickly as possible (so that the event scheduler is not hung up for too long. The script is invoked as the user, since the event scheduler is running as the user.
-
Leigh B. Stoller authored
of virt_tables so that it is saved and restored like the rest of the virtual state.
-
Leigh B. Stoller authored
its going to get replaced at some point by a busy state. The swap scripts properly set the next state before unlocking the experiments table, which possibly leaves some small races as experiments transition through states (which happens with the table unlocked, cause I used to have this really handy variable called expt_locked, which no one really likes anymore). We either have to use more table locking, fix up expt_locked, or punt and say it won't happen more than once in a few thousand operations!
-
Leigh B. Stoller authored
instead of testbed-ops. Either way, Mike gets to see it.
-
- 17 Nov, 2003 1 commit
-
-
Leigh B. Stoller authored
state machine (state). All of the stuff that was previously handled by using batchstate is now embedded into the one state machine. Of course, these mostly overlapped, so its not that much of a change, except that we also redid the machine, adding more states (for example, modify phases are now explicit. To get a picture of the actual state machine, on boss: stategraph -o newstates EXPTSTATE gv newstates.ps Things to note: * The "batchstate" slot of the experiments table is now used solely to provide a lock for batch daemon. A secondary change will be to change the slot name to something more appropriate, but it can happen anytime after this new stuff is installed. * I have left expt_locked for now, but another later change will be to remove expt_locked, and change it to active_busy or some such new state name in the state machine. I have removed most uses of expt_locked, except those that were necessary until there is a new state to replace it. * These new changes are an implementation of the new state machine, but I have not done anything fancy. Most of the code is the same as it was before. * I suspect that there are races with the batch daemon now, but they are going to be rare, and the end result is probably that a cancelation is delayed a little bit.
-
- 29 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 16 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
swapped out (non-recoverable) by tbswap. swapexp was leaving the experiment in the running state instead of paused. We need to check this after tbswap since we do not get reasonable error codes back. Also some cleanup with respect to how aborted modifies are handled. I think I understand what Chad did ... A general comment; we need to be better about returning meaningful error codes!
-