-
Leigh B. Stoller authored
* The first involves swapmod. When a swapmod on an active experiment fails, tbswap will reswap the experiment back to the original configuration. The problem is that it is reswapping it with the *new* virtual state of the experiment in the DB. It is not until later when control returns to swapexp that the virtual state is restored. This is plainly wrong, and in fact was causing the event scheduler grief cause it was starting up, reading the the virtual topo, which was different, wrong, and about to be blown away. I reorganized the modify section of swapexp so that virtual state is restored only when its a swapmod on a swapped experiment. On an active experiment, I moved that code down into tbswap, which will now does all of the virtual and physical state retore before it does the reswap back to the original experiment. Just for kicks, its also done if tbswap decides to swap the experiment cause of a fatal error. Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot deal with !$NoRecover. I know, two knots make a wright for most people. Renderer: I was annoyed by the fact that we rerun the renderer on a failed swapmod. The original reason is that the renderer runs in the background and so vis_nodes cannot be saved with the rest of the virtual state tables cause the renderer might still be running when the user fires off the swapmod. Well, the hell with that. We lock the vis_nodes table anyway in the renderer during update, so we are certain to get a consistent snapshot. We store the renderer pid in the experiments table, so if the renderer was running, just fire off another one; mostly this is not going to happen. In addition, tbprerun no longer starts a new renderer when doing the swapmod; I start the new renderer later after swapmod succeeds. I might end up tweaking this a bit depending on what people notice as being different. * Termination changes to batchexp and swapexp: I've rearranged the termination code using an END block so that any uncontrolled exit from either batchexp or swapexp will go through the cleanup code, and hopefully insert a stats record, as well as not leave the experiment in some inbetween state. I've set the max DB retry count to zero in both cases, which means infinite retry. I've also added SIGTERM handlers to both so that again, we can kill a hung batch/swap and have it clean up things more or less. Note that END blocks are not caught when a signal causes the program to die; you have to catch it and then die() so that the END block is executed. Eventually, we need to clean up the various libraries so that we do not use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure. Ditto for event system interface.
9f4edbba