1. 16 Dec, 2004 1 commit
    • Leigh B. Stoller's avatar
      The panic button ... · 87dd2e60
      Leigh B. Stoller authored
      * tbsetup/panic.in: New backend script to implement the panic button
        feature. When used, it will cut the severe the connection to the
        firewall node by using snmpit to disable the port. Sets the panic
        bit (and date) in the experiments table, and changes the state of
        the experiment from "active" to "paniced" to ensure that the
        experiment cannot be messed with (swapped out or modified). Sends
        email to tbops when the panic button is pressed.
      
        Used with -r option, reverses the above. State is set back to
        active, the panic bit is cleared, and the port is renabled with
        snmpit.
      
      * tbsetup/tbswap.in: During swapout, a firewalled experiment that has
        been paniced will get a cleaning; The nodes are powered off, then
        the osids for all the nodes are reset (with os_select) so that they
        will boot the MFS, and then the nodes are powered on. Then the
        control network is turned back on, and then I wait for the nodes to
        reboot (this is simply cause we do not record in the DB that a node
        is turned off, and if I do not wait, the reload daemon will end
        hitting the power button again if they do not reboot in time. We can
        fix this later.
      
        I am not planning to apply this to general firewalled experiments
        yet as the power cycling is going to be hard on the nodes, so would
        rather that we at least have a 1/2 baked plan before we do that.
      
      * www/showexp.php3: If experiment is firewalled, show the Panic
        Button, linked to the panic button web script. If the experiment has
        already had the panic button pressed, show a big warning message and
        explain that user must talk to tbops to swap the experiment out.
        Also fiddle with menu options so that the terminate link is gone,
        and the swap link is visible only in admin mode. In other words, only
        an admin person can swap an experiment once it is paniced. And of
        course, an admin person can the backend panic script above with the
        -r option, but thats not something to be done lightly.
      
      * db/libdb.pm.in: Add "paniced" as an experiment state (EXPTSTATE_PANICED).
        Add utility functions: TBExptSetPanicBit(), TBExptGetPanicBit(), and
        TBExptClearPanicBit().
      
      * tbsetup/swapexp.in: Minor state fiddling so that an experiment can
        be swapped while in paniced state, but only when in admin mode. Also
        clear the panic bit when experiment is swapped out.
      
      * www/dbdefs.php3.in: Add "paniced" as an experiment state. Add a
        utility function TBExptFirewall() to see if experiment is firewalled.
      
      * www/panicbutton.php3: New web script to invoke the backend panic
        script mentioned above, after the usual confirm song and dance.
      
      * www/panicbutton.gif: New gif of a red panic button that I stole off
        the net. If anyone has sees/has a better one, feel free to replace
        this one.
      
      * utils/node_statewait.in: Add -s option so that I can pass in the
        state I want to wait for (used from tbswap above to wait for nodes
        to reach ISUP after power on).
      87dd2e60
  2. 15 Nov, 2004 1 commit
    • Leigh B. Stoller's avatar
      A bunch of ElabInELab changes. · 10b116e0
      Leigh B. Stoller authored
      * snmpit: When ElabInELabis true, use the routines in the new
        snmpit_remote.pm library for setting up and tearing down vlans for an
        experiment. At present, only these two operations are proxied out to
        the outer emulab.
      
      * snmpit_remote.pm: A new little library that uses the XMLPRC server on
        the outer emulab to setup and destroy vlans for an inner experiment.
        This code is used from snmpit (see above).
      
      * snmpit_lib.pm: A couple of minor changes for the server side of the
        proxy operation.
      
      * snmpit.proxy.in: A new perl module that is invoked from the RPC
        server.  This proxy sets up and tears down vlans for an inner elab.
        The basic model is that the container experiment will have lots of
        vlans for various individual experiments running on the inner emulab.
      
      * swapexp: A couple of minor elabinelab hacks.
      
      * tbswap: For elabinelab experiments, reconfig/restart dhcpd when
        tearing down the experiment, and call out to new elabinelab script
        when setting up an elabinelab experiment. There is no provision for
        swapmod at this time.
      
      * elabinelab: A new script to create the inner emulab. Does all kinds of
        gross DB stuff then more gross stuff on the inner ops and boss.
      10b116e0
  3. 30 Aug, 2004 1 commit
  4. 29 Jul, 2004 1 commit
    • Leigh B. Stoller's avatar
      Two unrelated bug fixes (with some related cleanups and tweaks) · 9f4edbba
      Leigh B. Stoller authored
      * The first involves swapmod. When a swapmod on an active experiment fails,
        tbswap will reswap the experiment back to the original configuration. The
        problem is that it is reswapping it with the *new* virtual state of the
        experiment in the DB. It is not until later when control returns to
        swapexp that the virtual state is restored. This is plainly wrong, and in
        fact was causing the event scheduler grief cause it was starting up,
        reading the the virtual topo, which was different, wrong, and about to be
        blown away.
      
        I reorganized the modify section of swapexp so that virtual state is
        restored only when its a swapmod on a swapped experiment. On an active
        experiment, I moved that code down into tbswap, which will now does all
        of the virtual and physical state retore before it does the reswap back
        to the original experiment. Just for kicks, its also done if tbswap
        decides to swap the experiment cause of a fatal error.
      
        Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot
        deal with !$NoRecover. I know, two knots make a wright for most people.
      
        Renderer: I was annoyed by the fact that we rerun the renderer on a
        failed swapmod. The original reason is that the renderer runs in the
        background and so vis_nodes cannot be saved with the rest of the virtual
        state tables cause the renderer might still be running when the user
        fires off the swapmod. Well, the hell with that. We lock the vis_nodes
        table anyway in the renderer during update, so we are certain to get a
        consistent snapshot. We store the renderer pid in the experiments table,
        so if the renderer was running, just fire off another one; mostly this is
        not going to happen. In addition, tbprerun no longer starts a new
        renderer when doing the swapmod; I start the new renderer later after
        swapmod succeeds. I might end up tweaking this a bit depending on what
        people notice as being different.
      
      * Termination changes to batchexp and swapexp: I've rearranged the
        termination code using an END block so that any uncontrolled exit from
        either batchexp or swapexp will go through the cleanup code, and
        hopefully insert a stats record, as well as not leave the experiment in
        some inbetween state. I've set the max DB retry count to zero in both
        cases, which means infinite retry. I've also added SIGTERM handlers to
        both so that again, we can kill a hung batch/swap and have it clean up
        things more or less. Note that END blocks are not caught when a signal
        causes the program to die; you have to catch it and then die() so that
        the END block is executed.
      
        Eventually, we need to clean up the various libraries so that we do not
        use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure.
        Ditto for event system interface.
      9f4edbba
  5. 26 Jul, 2004 1 commit
    • Leigh B. Stoller's avatar
      Okay, lets clear up some confusion when swapmod fails and 1) the · fdac8b89
      Leigh B. Stoller authored
      experiment is swapped or 2) the experiment is completely terminated.
      In these case, lets put explicit swapout/destroy events into
      testbed_stats so that the record is not confused by experiments that
      appear to start when they are still running. This really throws off
      the summary stats web page!
      fdac8b89
  6. 15 Jul, 2004 1 commit
  7. 29 Jun, 2004 1 commit
  8. 17 May, 2004 1 commit
  9. 13 May, 2004 2 commits
  10. 29 Apr, 2004 1 commit
    • Leigh B. Stoller's avatar
      Add prelim support for using linktest. Because of problems, this is · 6cdccbd2
      Leigh B. Stoller authored
      currently available to only people with stud=1 status in the DB.
      
      * www/tbauth.php3: Add a STUDLY() function to check that bit.
      
      * www/linktest.php3: New page to run linktest on the fly. The level
        defaults to the current level in the experiments table, but you can
        override that via the form on the page.
      
      * www/showexp.php3: Add link to aforementioned page. STUDLY() only.
      
      * www/beginexp_form.php3: Add an option (selection) to set the linktest
        level for create/swapin. Defaults to 0 (no linktest). STUDLY() only.
      
      * www/editexp.php3: Add an option to edit the default linktest level
        for an experiment. STUDLY() only.
      
      * tbsetup/batchexp.in and tbsetup/swapexp.in: Add code to optionally run
        the linktest, sending email if it fails (exists with non-zero status).
        Failure does not affect the swapin.
      6cdccbd2
  11. 07 Apr, 2004 1 commit
  12. 15 Mar, 2004 1 commit
  13. 12 Feb, 2004 1 commit
    • Leigh B. Stoller's avatar
      * Removed startexp, and merged its contents into batchexp. There has been · aef08532
      Leigh B. Stoller authored
        no reason for the separation for a long time, and it made maintence more
        difficult cause of duplication between batchexp and startexp (batch was
        the sole user of startexp). Cleaner solution.
      
      * Check argument processing for batchexp, swapexp, endexp to make sure the
        taint checks are correct. All three of these scripts will now be
        available from ops. I especially watch the filename processing, which was
        pretty loose before and could allow some to grab a file on boss by trying
        to use it as an NS file (scripts all runs as user of course). The web
        interface generates filenames that are hard to guess, so rather then
        wrapping these scripts when invoked from ops, just allow the usual paths
        (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which
        should be hard enough to guess that users will not be able to get
        anything they are not supposed to.
      
      * Add -w (waitmode) options to all three scripts. In waitmode, the backend
        detaches, but the parent remains waiting for the child to finish so it
        can exit with the appropriate status (for scripting). The user can
        interrupt (^C), but it has no effect on the backend; it just kills the
        parent side that is waiting (backend is in a new session ID). Log outout
        still goes to the file (available from web page) and is emailed.
      aef08532
  14. 05 Feb, 2004 1 commit
  15. 08 Jan, 2004 1 commit
  16. 18 Nov, 2003 4 commits
    • Leigh B. Stoller's avatar
      Minor additions for Shashi: · def28c32
      Leigh B. Stoller authored
      * Make the NS file an optional argument to swapexp modify; when not
        given the prerun phase is skipped. Instead, go directly to tbswap
        (run assign, etc).
      
      * Add NSESWAP event so that Shashi can fire off the above modify using
        tevc from an experimental node.
      
      	tevc -e pid/eid now ns nseswap
      
      * Change event scheduler to react to above event, and fire off:
      
      	nseswap pid eid
      
        as the user. The script should do its thing, and *exec* swapexp with
        the proper args as quickly as possible (so that the event scheduler
        is not hung up for too long. The script is invoked as the user,
        since the event scheduler is running as the user.
      def28c32
    • Leigh B. Stoller's avatar
      Remove some special handling for the nsfiles table; make it part · 942cdc07
      Leigh B. Stoller authored
      of virt_tables so that it is saved and restored like the rest of
      the virtual state.
      942cdc07
    • Leigh B. Stoller's avatar
      Ah, just get rid of the expt_locked check. Not worth the trouble and · c78e9c71
      Leigh B. Stoller authored
      its going to get replaced at some point by a busy state. The swap
      scripts properly set the next state before unlocking the experiments
      table, which possibly leaves some small races as experiments
      transition through states (which happens with the table unlocked,
      cause I used to have this really handy variable called expt_locked,
      which no one really likes anymore).
      
      We either have to use more table locking, fix up expt_locked, or punt
      and say it won't happen more than once in a few thousand operations!
      c78e9c71
    • Leigh B. Stoller's avatar
      Change die() to ExitWithStatus(1) so that the user sees the message · 4aae6ce5
      Leigh B. Stoller authored
      instead of testbed-ops. Either way, Mike gets to see it.
      4aae6ce5
  17. 17 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  18. 29 Oct, 2003 1 commit
  19. 16 Oct, 2003 1 commit
    • Leigh B. Stoller's avatar
      Fix bug with respect to modified experiments that abort and get · 589e97d2
      Leigh B. Stoller authored
      swapped out (non-recoverable) by tbswap. swapexp was leaving the
      experiment in the running state instead of paused. We need to check
      this after tbswap since we do not get reasonable error codes back.
      Also some cleanup with respect to how aborted modifies are handled.
      I think I understand what Chad did ...
      
      A general comment; we need to be better about returning meaningful
      error codes!
      589e97d2
  20. 30 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Up to now we have had two state variables associated with an experiment, · 4269dad1
      Leigh B. Stoller authored
      plus a lock field. The lock field was a simple "experiment locked, go away"
      slot that is easy to use when you do not care about the actual state that
      an experiment is in, just that it is in "transition" and should not be
      messed with.
      
      The other two state variables are "state" and "batchstate". The former
      (state) is the original variable that Chris added, and was used by the tb*
      scripts to make sure that the experiment was in the state each particular
      script wanted them to be in. But over time (and with the addition of so
      much wrapper goo around them), "state" has leaked out all over the place to
      determine what operations on an experiment are allowed, and if/when it
      should be displayed in various web pages. There are a set of transition
      states in addition to the usual "active", "swapped", etc like "swapping"
      that make testing state a pain in the butt.
      
      I added the other state variable ("batchstate") when I did the batch
      system, obviously! It was intended as a wrapper state to control access to
      the batch queue, and to prevent batch experiments from being messed with
      except when it was really okay (for example, its okay to terminate a
      swapped out batch experiment, but not a swapped in batch experiment since
      that would confuse the batch daemon). There are fewer of these states, plus
      one additional state for "modifying" experiments.
      
      So what I have done is change the system to use "batchstate" for all
      experiments to control entry into the swap system, from the web interface,
      from the command line, and from the batch daemon. The other state variable
      still exists, and will be brutally pushed back under the surface until its
      just a vague memory, used only by the original tb* scripts. This will
      happen over time, and the "batchstate" variable will be renamed once I am
      convinced that this was the right thing to do and that my changes actually
      work as intended.
      
      Only people who have bothered to read this far will know that I also added
      the ability to cancel experiment swapin in progress. For that I am using
      the "canceled" flag (ah, this one was named properly from the start!), and
      I test that at various times in assign_wrapper and tbswap. A minor downside
      right now is that a canceled swapin looks too much like a failed swapin,
      and so tbops gets email about it. I'll fix that at some point (sometime
      after the boss complains).
      
      I also cleaned up various bits of code, replacing direct calls to exec
      with calls to the recently improved SUEXEC interface. This removes
      some cruft from each script that calls an external script.
      
      Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting.
      Also fixed to not run the parser directly! This was very wrong; should
      call nscheck instead. Changed to use "nobody" group instead of group
      flux (made the same change in nscheck).
      
      There is a script in the sql directory called newstates.pl. It needs
      to be run to initialize the batchstate slot of the experiments table
      for all existing experiments.
      4269dad1
  21. 07 Aug, 2003 1 commit
  22. 06 Aug, 2003 1 commit
    • Leigh B. Stoller's avatar
      Clean up temporary files used in modify. The temp dirs were being · 05bd80ff
      Leigh B. Stoller authored
      created in /tmp and left behind. I've moved them to the expwork
      directory instead, and added a routine in the library to clear them
      out.
      
      Clear out the nsfile (stored in /tmp) used in modify. The web page was
      creating a temp file, but never removing it. swapexp now copies the
      nsfile in so that the web page can remove the temporary after the
      script exits. The temp is placed in the expwork directory as well, but
      left behind for debugging.
      
      When swapmod fails, send along the nsfile in the email message.
      05bd80ff
  23. 30 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      Change the prerender code to run in the background so that Mike does · 11d792e3
      Leigh B. Stoller authored
      not have to wait 3 minutes for it to finish before he can watch his
      experiment swapin fail for some other reason.
      
      I adopted the same pid mechanism as in eventsys_control.in, which uses
      a slot in the experiments table.
      
      Running "prerender" puts the render into the background and stores
      the pid. Running "prerender -r" kills a running prerender and removes
      the existing info from the DB.
      
      Fixed the problem with swapmod not restoring the old vis; swapmod now
      kills any running prerender, and restarts one if the swapmod fails
      (the prerun of the new NS file starts up another prerender in the
      background).
      
      Add setpriority() call in prerender to nice it and children to 15.
      11d792e3
  24. 29 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      Some cleanup on the batch mode stuff. Make it more explicit in the · 29b820b1
      Leigh B. Stoller authored
      showexp page that its a batch experiment, by the menu options. Same
      deal in the swapexp output, plus some other minor cleanup. The only
      bug I found while trying to figure out the batchmode problem reported
      this morning by the FileMover people, is that the cancelflag is not
      cleared after swaping a running batch experiment out, so even after
      reinjecting it into the queue, it will not run. Still, that does seem
      to be what the FileMover people reported.
      29b820b1
  25. 27 Jul, 2003 1 commit
  26. 17 Jul, 2003 1 commit
  27. 11 Jun, 2003 1 commit
  28. 09 Jun, 2003 1 commit
  29. 05 Jun, 2003 2 commits
  30. 04 Jun, 2003 1 commit
  31. 03 Jun, 2003 1 commit
  32. 28 May, 2003 1 commit
  33. 25 May, 2003 1 commit
  34. 24 May, 2003 1 commit
    • Mac Newbold's avatar
      Round of changes related to idleswapping and autoswapping. The web and · 02aaf8e4
      Mac Newbold authored
      back end scripts now support 3 different kind of forced swaps:
      
      1. Idle-Swap : this is ths same one we had before. Email message to them
      says it was swapped "because it was idle for too long"
      
      2. Auto-Swap : A new one, typically for user-requested timed swapouts.
      Email says it was swapped "because it was swapped in too long"
      
      3. Force swap: Generic one, for "none of the above" cases. Just says
      Experiment "has been forcibly swapped out by Testbed Operations."
      
      The force swap option on the web now lets you choose which of these three
      you want. Only "Idle-Swap" counts as an idleswap in the stats. Soon
      idleswap and autoswap will be used by idlemail when it does automatic
      swapping.
      02aaf8e4
  35. 22 May, 2003 1 commit
    • Leigh B. Stoller's avatar
      Reorg the batch system slightly as per Eric's request that batch mode · da97ba35
      Leigh B. Stoller authored
      experiments look more like regular experiments. Batch mode experiments
      can now be preloaded and swapped. When preloaded, they go into a
      "Pause" state. Swapping a batch mode experiment in puts them into the
      "posted" state so the batch daemon will see them. Swapping out a
      batchmode experiment does the expected; it puts them back into the
      Pause state. Terminating a batch mode experiment does the expected;
      its gone. When a batch mode experiment finishes normally, it goes back
      into the pause state, which allows batches to be reinjected as many
      times as Eric likes.
      da97ba35