- 18 Nov, 2003 2 commits
-
-
Leigh B. Stoller authored
its going to get replaced at some point by a busy state. The swap scripts properly set the next state before unlocking the experiments table, which possibly leaves some small races as experiments transition through states (which happens with the table unlocked, cause I used to have this really handy variable called expt_locked, which no one really likes anymore). We either have to use more table locking, fix up expt_locked, or punt and say it won't happen more than once in a few thousand operations!
-
Leigh B. Stoller authored
instead of testbed-ops. Either way, Mike gets to see it.
-
- 17 Nov, 2003 1 commit
-
-
Leigh B. Stoller authored
state machine (state). All of the stuff that was previously handled by using batchstate is now embedded into the one state machine. Of course, these mostly overlapped, so its not that much of a change, except that we also redid the machine, adding more states (for example, modify phases are now explicit. To get a picture of the actual state machine, on boss: stategraph -o newstates EXPTSTATE gv newstates.ps Things to note: * The "batchstate" slot of the experiments table is now used solely to provide a lock for batch daemon. A secondary change will be to change the slot name to something more appropriate, but it can happen anytime after this new stuff is installed. * I have left expt_locked for now, but another later change will be to remove expt_locked, and change it to active_busy or some such new state name in the state machine. I have removed most uses of expt_locked, except those that were necessary until there is a new state to replace it. * These new changes are an implementation of the new state machine, but I have not done anything fancy. Most of the code is the same as it was before. * I suspect that there are races with the batch daemon now, but they are going to be rare, and the end result is probably that a cancelation is delayed a little bit.
-
- 29 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 16 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
swapped out (non-recoverable) by tbswap. swapexp was leaving the experiment in the running state instead of paused. We need to check this after tbswap since we do not get reasonable error codes back. Also some cleanup with respect to how aborted modifies are handled. I think I understand what Chad did ... A general comment; we need to be better about returning meaningful error codes!
-
- 30 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
plus a lock field. The lock field was a simple "experiment locked, go away" slot that is easy to use when you do not care about the actual state that an experiment is in, just that it is in "transition" and should not be messed with. The other two state variables are "state" and "batchstate". The former (state) is the original variable that Chris added, and was used by the tb* scripts to make sure that the experiment was in the state each particular script wanted them to be in. But over time (and with the addition of so much wrapper goo around them), "state" has leaked out all over the place to determine what operations on an experiment are allowed, and if/when it should be displayed in various web pages. There are a set of transition states in addition to the usual "active", "swapped", etc like "swapping" that make testing state a pain in the butt. I added the other state variable ("batchstate") when I did the batch system, obviously! It was intended as a wrapper state to control access to the batch queue, and to prevent batch experiments from being messed with except when it was really okay (for example, its okay to terminate a swapped out batch experiment, but not a swapped in batch experiment since that would confuse the batch daemon). There are fewer of these states, plus one additional state for "modifying" experiments. So what I have done is change the system to use "batchstate" for all experiments to control entry into the swap system, from the web interface, from the command line, and from the batch daemon. The other state variable still exists, and will be brutally pushed back under the surface until its just a vague memory, used only by the original tb* scripts. This will happen over time, and the "batchstate" variable will be renamed once I am convinced that this was the right thing to do and that my changes actually work as intended. Only people who have bothered to read this far will know that I also added the ability to cancel experiment swapin in progress. For that I am using the "canceled" flag (ah, this one was named properly from the start!), and I test that at various times in assign_wrapper and tbswap. A minor downside right now is that a canceled swapin looks too much like a failed swapin, and so tbops gets email about it. I'll fix that at some point (sometime after the boss complains). I also cleaned up various bits of code, replacing direct calls to exec with calls to the recently improved SUEXEC interface. This removes some cruft from each script that calls an external script. Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting. Also fixed to not run the parser directly! This was very wrong; should call nscheck instead. Changed to use "nobody" group instead of group flux (made the same change in nscheck). There is a script in the sql directory called newstates.pl. It needs to be run to initialize the batchstate slot of the experiments table for all existing experiments.
-
- 07 Aug, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 06 Aug, 2003 1 commit
-
-
Leigh B. Stoller authored
created in /tmp and left behind. I've moved them to the expwork directory instead, and added a routine in the library to clear them out. Clear out the nsfile (stored in /tmp) used in modify. The web page was creating a temp file, but never removing it. swapexp now copies the nsfile in so that the web page can remove the temporary after the script exits. The temp is placed in the expwork directory as well, but left behind for debugging. When swapmod fails, send along the nsfile in the email message.
-
- 30 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
not have to wait 3 minutes for it to finish before he can watch his experiment swapin fail for some other reason. I adopted the same pid mechanism as in eventsys_control.in, which uses a slot in the experiments table. Running "prerender" puts the render into the background and stores the pid. Running "prerender -r" kills a running prerender and removes the existing info from the DB. Fixed the problem with swapmod not restoring the old vis; swapmod now kills any running prerender, and restarts one if the swapmod fails (the prerun of the new NS file starts up another prerender in the background). Add setpriority() call in prerender to nice it and children to 15.
-
- 29 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
showexp page that its a batch experiment, by the menu options. Same deal in the swapexp output, plus some other minor cleanup. The only bug I found while trying to figure out the batchmode problem reported this morning by the FileMover people, is that the cancelflag is not cleared after swaping a running batch experiment out, so even after reinjecting it into the queue, it will not run. Still, that does seem to be what the FileMover people reported.
-
- 27 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
final time, so that we can see how long things take. As per Jay's request.
-
- 17 Jul, 2003 1 commit
-
-
Mac Newbold authored
Fix up email message text, hide swappable bit, and ignore being in transition on a forced swap. When I say force, I mean it, dang it!
-
- 11 Jun, 2003 1 commit
-
-
Mac Newbold authored
-
- 09 Jun, 2003 1 commit
-
-
Mac Newbold authored
-
- 05 Jun, 2003 2 commits
-
-
Leigh B. Stoller authored
-
Mac Newbold authored
-
- 04 Jun, 2003 1 commit
-
-
Mac Newbold authored
-
- 03 Jun, 2003 1 commit
-
-
Mac Newbold authored
-
- 28 May, 2003 1 commit
-
-
Mac Newbold authored
modify, etc). Notify the user on the web page just before they hit confirm if we'll be updating it, as an FYI. (Only tell them if it matters, ie if the idleswap bit is set.)
-
- 25 May, 2003 1 commit
-
-
Mac Newbold authored
-
- 24 May, 2003 1 commit
-
-
Mac Newbold authored
back end scripts now support 3 different kind of forced swaps: 1. Idle-Swap : this is ths same one we had before. Email message to them says it was swapped "because it was idle for too long" 2. Auto-Swap : A new one, typically for user-requested timed swapouts. Email says it was swapped "because it was swapped in too long" 3. Force swap: Generic one, for "none of the above" cases. Just says Experiment "has been forcibly swapped out by Testbed Operations." The force swap option on the web now lets you choose which of these three you want. Only "Idle-Swap" counts as an idleswap in the stats. Soon idleswap and autoswap will be used by idlemail when it does automatic swapping.
-
- 22 May, 2003 1 commit
-
-
Leigh B. Stoller authored
experiments look more like regular experiments. Batch mode experiments can now be preloaded and swapped. When preloaded, they go into a "Pause" state. Swapping a batch mode experiment in puts them into the "posted" state so the batch daemon will see them. Swapping out a batchmode experiment does the expected; it puts them back into the Pause state. Terminating a batch mode experiment does the expected; its gone. When a batch mode experiment finishes normally, it goes back into the pause state, which allows batches to be reinjected as many times as Eric likes.
-
- 21 May, 2003 1 commit
-
-
Leigh B. Stoller authored
each portion of the experiment as it is modified. Also add expt_swap_uid so that we know who did the last operation, and so we can charge/credit the right person. So, if joe swaps in the experiment and jane swaps it out, joe gets charged. If jane swaps in the experiment and joe modifies it, jane gets credit for the first portion, and joe will later get charged for the second portion. Took longer to explain then to implement ... Lbs
-
- 15 May, 2003 1 commit
-
-
Leigh B. Stoller authored
per-experiment instantiation with aggregate data like the number of swapins, the dates and the like. The other part is the per swapin/modify stats. These are number of pnodes, links, lans, etc. Long term, I think we want more precise swapin stats, and with experiment modify in the mix, we need to have multiple stat records per experiment, but do not need to duplicate all the stuff in the other table just mentioned. To reduce the amount the table size, we cross reference the tables by index only instead of with pid,eid and the like. We use exptidx to link experiments, experiment_stats, and the new experiment_resources table. experiment_resources and stats are linked by another index in the resources table, which indicates which is the current resource row. On a modify, a new resource record is created, and the stats record updated to point to the new (latest) resource record. Web Changes: Improve showstats and showexpstats. Make them user accessible so that mere users can see stats for themselves and for their projects. No ability for mere users (PIs) to look at another person's stats. Generally, these two pages need more work, but now they are more useful. I added Show Stats to the user info and project info pages to display per-usr/proj stats. Add more info in the showstats display, but the showexpstats display is still not pretty printed; just the raw tables. Rename a few fields, add some indexes, and otherwise make some minor changes that are sure to annoy everyone.
-
- 05 May, 2003 1 commit
-
-
Leigh B. Stoller authored
idleswaps are not showing up in the testbed stats.
-
- 01 May, 2003 1 commit
-
-
Mac Newbold authored
but later changes to where/how the email is sent took it out.)
-
- 30 Apr, 2003 1 commit
-
-
Leigh B. Stoller authored
tb tools! I've changed the batch system to "preload" the experiment in foreground mode (results of parse spit back to user directly). The batch daemon now uses swapexp instead of startexp. Upon failure, the experiment goes back to the "swapped" state; previously its virt state was blasted, and rentered again next try. This is nice cause you can actually look at the batch experiment (vis, virt tables, etc) while it is posted and not running. Not sure if all the Ts are crossed. Will find out ...
-
- 29 Apr, 2003 1 commit
-
-
Chad Barb authored
Various Other changes to get Expt Modify ready for prime time. - If assign fails on a modify, experiment will be restored to old state, *not* swapped out. - Reboot option has been improved to reboot all nodes as part of os_setup, not in separate step. - Different assign error codes result in different retry behavior for assign_wrapper (Follow's Rob's change to assign to make it pass back special code for non-retriable faults) - '64' bit in assign_wrapper exit code indicates to tbswap that db/phys state hadn't been mucked with before the exit occurred (ergo, '65' and '1' are the common return codes, though the old 4,8,16,32 are still there for assign failing.) - (tbswap still returns codes from assign wrapper) - Added 5 sec pause between assign attempts. - Cleaned up tbswap code. - Physical state backup/restore removed from tbprerun, put into swapexp. - Interfaces table now getting cleaned up correctly (Mike noticed problem) - Changed menu display in showexp to show the "modify" menu option for swapped out experiments (like it used to.) - A couple other changes. Note: Still admin-only, but I plan to change that soon. To do: - Erase expt backups in /tmp after using them. - Re-viz failed experiments.
-
- 28 Apr, 2003 2 commits
-
-
Leigh B. Stoller authored
swap_exitcode (last error), idle_swaps (a count), batch (a flag to indicate a batch experiment). Add a operational log. Okay, its not actually a log, but a table that will grow forever until it consumes the earth. Its a small table though, so it will take a few years. Its cross indexed with the experiment_stats table, so by massaging this table along with the stats table, we can get a good picture of what was running on the testbed when, and how many resources it was using. Sorry, not a log file, but we can easily generate a log file from tbe table if the Boss really wants one. The table entry averages 28 bytes. Move stats to their own main menu item (admin mode only). Remove from the showexp_list page since that was bogus.
-
Leigh B. Stoller authored
The first three are aggregate tables, while the experiment stats table gets a record for each new experiment, and is updated when an experiment is swapped in/out/modify or terminated. Look at the table to see what is tracked. Once the experiment_stats record is updated, the aggregate tables are updated as necessary. There are a bunch of ugly changes to assign_wrapper to get the stats. Note that pnodes is not incremented until an experiment sucessfully swaps in. This is in leu of getting status codes; I'm not tracking failed operations yet, nor creating the log file that Jay wants. I'll do that in the next round of changes when we see how useful these numbers are. Most of the changes are to create/delete table entries where appropriate, and to display the records. Display is only under admin mode, and the display is raw; just a dump of the assoc tables in php. The last 100 experiment stats records are available via the Experiment List page, using the "Stats" show option at the top. Bad place, but will do for now.
-
- 17 Apr, 2003 1 commit
-
-
Chad Barb authored
For the benefit of our users, added 'reboot nodes in experiment' checkbox, on by default, with a stern warning.
-
- 16 Apr, 2003 1 commit
-
-
Leigh B. Stoller authored
experiment, rather than as an administrator, which presents group permission problems when the experiment is in a subgroup (requires two additional group, whereas suexec adds only one group). That aside, the correct approach is to run the swap as the creator. To do that, must flip to the user (from the admin person) in the backend using the new idleswap script, and then run the normal swapexp. Add new option to swapexp (-i) which changes the email slightly to make it clear that the experiment was idleswapped, and so that the From: is tbops not the user (again, to make it more clear).
-
- 03 Apr, 2003 1 commit
-
-
Chad Barb authored
Added new feature 'Experiment Modify'. Now available (to admins only for now) from the showexp page. Warning! doing a modify which alters the topology will probably require a "reboot all nodes" afterwards. (There will be a checkbox soon in the modify experiment page.) Adding/removing delay nodes seems to work fine without reboots, though. Warning! If the new version of the experiment cannot be mapped (not enough nodes available, for instance) the experiment will be swapped out! This will get fixed later. Prerun backs up the experiment topology, so using a bad NS file doesn't result in experiment termination. As part of this, added library functions to libdb to delete, backup, and restore both virtual and physical experiment state.
-
- 27 Mar, 2003 1 commit
-
-
Leigh B. Stoller authored
make one in /usr/testbed/expwork/$pid. Too much cruft getting left behind and it was causing even more log copy errors! Besides, typically its just tbops people who need to look at that stuff.
-
- 11 Mar, 2003 1 commit
-
-
Chad Barb authored
New version of unified tbswap in/out. startexp/endexp/swapexp have been changed to use new script. tbswapin and tbswapout have been replaced with a script which spits out a warning message, then calls tbswap appropriately. The README has also been modified.
-
- 18 Dec, 2002 1 commit
-
-
Leigh B. Stoller authored
Attempts to replay an experiment by rebooting all the nodes, clearing the various startup bits (ready, startstatus, bootstatus, portstats), and then restarting the event system. I am dubious that this is a workable solution because of the asynchronous nature of the testbed (nodes happily cruise from TBRESET to ISUP and beyond without stopping), and so its hard to truly replicate the initial lack of state that a freshly swapped in experiment has. Still, people requested it and I cheerfully provided it cause thats what I do; service with a smile and not a wit of complaint. Is anyone reading this?
-
- 16 Sep, 2002 1 commit
-
-
Leigh B. Stoller authored
experiment. Here is mail to tbops: * Moved the working directory for experiment setup/swap/end to a new directory located on boss instead of over NFS to /proj/$pid/$eid. This new location is /usr/testbed/expwork/$pid/$eid. * Changed the name of the directories we create in /usr/testbed/expinfo to $pid-$eid.$index where $index is a new autoincrement field in the DB table. I really hated the names that were created before. * Changed where logs are written from /tmp to the new location in /usr/testbed/expwork/$pid/$eid. Okay, why. * We no longer operate on NFS mounted directories that might hang. Its easier to catch the situation where a copy of the log file over at the end of experiment creation fails cause of an NFS problem. * We no longer have user writable files that are inputs to other parts of the system (like top and ptop files). Not that a user would be bad, but it closes a hole. * We no longer copy user writable files from /proj to boss where we might fill up an important filesystem cause the user put a .ndz file in the the working directory. Not that a user would be bad, but it closes a hole. * Its easier to save all the log files this way, for each swap in and out. * Removing a directory over NFS is a royal irritant when someone is CD'ed into that directory or looking at a file on the other side (the astute observer will peg this as the reason I went down this idiotic path in the first place!). * About 6 other reasons that I can no longer remember. Seriously, I really had more reasons I can no longer remember! :-)
-
- 11 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
directory so that they can be viewed later after the operation is complete. I've also cleaned up the mechanism for determining when a log file is active (for the web spew) by using another slot in the experiments table, and added some libdb routines to manage that slot. At present just the last (or latest) log can be viewed after the fact, but we can change that later if think its really necessary. At the same time, make it possible for admin types to view the log files for other peoples expierments; spew is setuid, but flips back after opening the file (does usual checks too). I've also incorporated the log changes into the batch daemon, so you can view the last batch log too, although I have not tested that yet!
-
- 07 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 16 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
transition error when you click too fast after creating it. Instead of looking at experiment state, use the logile slot of the experiments table, and make sure its cleared/set properly in start/swap experiment scripts. Also added a spew option to the swap page so you can watch experiments swap in/out.
-