• Leigh B. Stoller's avatar
    Up to now we have had two state variables associated with an experiment, · 4269dad1
    Leigh B. Stoller authored
    plus a lock field. The lock field was a simple "experiment locked, go away"
    slot that is easy to use when you do not care about the actual state that
    an experiment is in, just that it is in "transition" and should not be
    messed with.
    The other two state variables are "state" and "batchstate". The former
    (state) is the original variable that Chris added, and was used by the tb*
    scripts to make sure that the experiment was in the state each particular
    script wanted them to be in. But over time (and with the addition of so
    much wrapper goo around them), "state" has leaked out all over the place to
    determine what operations on an experiment are allowed, and if/when it
    should be displayed in various web pages. There are a set of transition
    states in addition to the usual "active", "swapped", etc like "swapping"
    that make testing state a pain in the butt.
    I added the other state variable ("batchstate") when I did the batch
    system, obviously! It was intended as a wrapper state to control access to
    the batch queue, and to prevent batch experiments from being messed with
    except when it was really okay (for example, its okay to terminate a
    swapped out batch experiment, but not a swapped in batch experiment since
    that would confuse the batch daemon). There are fewer of these states, plus
    one additional state for "modifying" experiments.
    So what I have done is change the system to use "batchstate" for all
    experiments to control entry into the swap system, from the web interface,
    from the command line, and from the batch daemon. The other state variable
    still exists, and will be brutally pushed back under the surface until its
    just a vague memory, used only by the original tb* scripts. This will
    happen over time, and the "batchstate" variable will be renamed once I am
    convinced that this was the right thing to do and that my changes actually
    work as intended.
    Only people who have bothered to read this far will know that I also added
    the ability to cancel experiment swapin in progress. For that I am using
    the "canceled" flag (ah, this one was named properly from the start!), and
    I test that at various times in assign_wrapper and tbswap. A minor downside
    right now is that a canceled swapin looks too much like a failed swapin,
    and so tbops gets email about it. I'll fix that at some point (sometime
    after the boss complains).
    I also cleaned up various bits of code, replacing direct calls to exec
    with calls to the recently improved SUEXEC interface. This removes
    some cruft from each script that calls an external script.
    Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting.
    Also fixed to not run the parser directly! This was very wrong; should
    call nscheck instead. Changed to use "nobody" group instead of group
    flux (made the same change in nscheck).
    There is a script in the sql directory called newstates.pl. It needs
    to be run to initialize the batchstate slot of the experiments table
    for all existing experiments.