-
Leigh B. Stoller authored
plus a lock field. The lock field was a simple "experiment locked, go away" slot that is easy to use when you do not care about the actual state that an experiment is in, just that it is in "transition" and should not be messed with. The other two state variables are "state" and "batchstate". The former (state) is the original variable that Chris added, and was used by the tb* scripts to make sure that the experiment was in the state each particular script wanted them to be in. But over time (and with the addition of so much wrapper goo around them), "state" has leaked out all over the place to determine what operations on an experiment are allowed, and if/when it should be displayed in various web pages. There are a set of transition states in addition to the usual "active", "swapped", etc like "swapping" that make testing state a pain in the butt. I added the other state variable ("batchstate") when I did the batch system, obviously! It was intended as a wrapper state to control access to the batch queue, and to prevent batch experiments from being messed with except when it was really okay (for example, its okay to terminate a swapped out batch experiment, but not a swapped in batch experiment since that would confuse the batch daemon). There are fewer of these states, plus one additional state for "modifying" experiments. So what I have done is change the system to use "batchstate" for all experiments to control entry into the swap system, from the web interface, from the command line, and from the batch daemon. The other state variable still exists, and will be brutally pushed back under the surface until its just a vague memory, used only by the original tb* scripts. This will happen over time, and the "batchstate" variable will be renamed once I am convinced that this was the right thing to do and that my changes actually work as intended. Only people who have bothered to read this far will know that I also added the ability to cancel experiment swapin in progress. For that I am using the "canceled" flag (ah, this one was named properly from the start!), and I test that at various times in assign_wrapper and tbswap. A minor downside right now is that a canceled swapin looks too much like a failed swapin, and so tbops gets email about it. I'll fix that at some point (sometime after the boss complains). I also cleaned up various bits of code, replacing direct calls to exec with calls to the recently improved SUEXEC interface. This removes some cruft from each script that calls an external script. Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting. Also fixed to not run the parser directly! This was very wrong; should call nscheck instead. Changed to use "nobody" group instead of group flux (made the same change in nscheck). There is a script in the sql directory called newstates.pl. It needs to be run to initialize the batchstate slot of the experiments table for all existing experiments.
4269dad1