1. 15 Nov, 2004 1 commit
    • Leigh B. Stoller's avatar
      A bunch of ElabInELab changes. · 10b116e0
      Leigh B. Stoller authored
      * snmpit: When ElabInELabis true, use the routines in the new
        snmpit_remote.pm library for setting up and tearing down vlans for an
        experiment. At present, only these two operations are proxied out to
        the outer emulab.
      
      * snmpit_remote.pm: A new little library that uses the XMLPRC server on
        the outer emulab to setup and destroy vlans for an inner experiment.
        This code is used from snmpit (see above).
      
      * snmpit_lib.pm: A couple of minor changes for the server side of the
        proxy operation.
      
      * snmpit.proxy.in: A new perl module that is invoked from the RPC
        server.  This proxy sets up and tears down vlans for an inner elab.
        The basic model is that the container experiment will have lots of
        vlans for various individual experiments running on the inner emulab.
      
      * swapexp: A couple of minor elabinelab hacks.
      
      * tbswap: For elabinelab experiments, reconfig/restart dhcpd when
        tearing down the experiment, and call out to new elabinelab script
        when setting up an elabinelab experiment. There is no provision for
        swapmod at this time.
      
      * elabinelab: A new script to create the inner emulab. Does all kinds of
        gross DB stuff then more gross stuff on the inner ops and boss.
      10b116e0
  2. 04 Oct, 2004 1 commit
  3. 09 Aug, 2004 1 commit
    • Leigh B. Stoller's avatar
      Some cleanups and performance improvements: · f604dc33
      Leigh B. Stoller authored
      * Be more selective about what lists are regenerated; we were generating
        way too many lists each time called. When calling from tbswap, use new
        -t option to generate just the active lists. When called from setgroups,
        use -p option to generate lists just for the project. Add update option
        for when user changes email address (and all lists really do need to be
        regenerated).
      
      * Add "diff" processing. Instead of blindly firing each new list over to
        ops with ssh, store a copy of all of the lists in
        /usr/testbed/lists. After we generate the new list, diff it against the
        stored copy. If the same, skip it. Otherwise stash new copy and fire it
        over. This should reduce the wait times by quite a bit since the lists
        rarely change (except for the activity lists of course).
      
      * Add -n (impotent) option for debugging; skips the ssh over to ops.
      
      * Reorg a lot of stuff; it was getting hard to follow.
      f604dc33
  4. 29 Jul, 2004 1 commit
    • Leigh B. Stoller's avatar
      Two unrelated bug fixes (with some related cleanups and tweaks) · 9f4edbba
      Leigh B. Stoller authored
      * The first involves swapmod. When a swapmod on an active experiment fails,
        tbswap will reswap the experiment back to the original configuration. The
        problem is that it is reswapping it with the *new* virtual state of the
        experiment in the DB. It is not until later when control returns to
        swapexp that the virtual state is restored. This is plainly wrong, and in
        fact was causing the event scheduler grief cause it was starting up,
        reading the the virtual topo, which was different, wrong, and about to be
        blown away.
      
        I reorganized the modify section of swapexp so that virtual state is
        restored only when its a swapmod on a swapped experiment. On an active
        experiment, I moved that code down into tbswap, which will now does all
        of the virtual and physical state retore before it does the reswap back
        to the original experiment. Just for kicks, its also done if tbswap
        decides to swap the experiment cause of a fatal error.
      
        Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot
        deal with !$NoRecover. I know, two knots make a wright for most people.
      
        Renderer: I was annoyed by the fact that we rerun the renderer on a
        failed swapmod. The original reason is that the renderer runs in the
        background and so vis_nodes cannot be saved with the rest of the virtual
        state tables cause the renderer might still be running when the user
        fires off the swapmod. Well, the hell with that. We lock the vis_nodes
        table anyway in the renderer during update, so we are certain to get a
        consistent snapshot. We store the renderer pid in the experiments table,
        so if the renderer was running, just fire off another one; mostly this is
        not going to happen. In addition, tbprerun no longer starts a new
        renderer when doing the swapmod; I start the new renderer later after
        swapmod succeeds. I might end up tweaking this a bit depending on what
        people notice as being different.
      
      * Termination changes to batchexp and swapexp: I've rearranged the
        termination code using an END block so that any uncontrolled exit from
        either batchexp or swapexp will go through the cleanup code, and
        hopefully insert a stats record, as well as not leave the experiment in
        some inbetween state. I've set the max DB retry count to zero in both
        cases, which means infinite retry. I've also added SIGTERM handlers to
        both so that again, we can kill a hung batch/swap and have it clean up
        things more or less. Note that END blocks are not caught when a signal
        causes the program to die; you have to catch it and then die() so that
        the END block is executed.
      
        Eventually, we need to clean up the various libraries so that we do not
        use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure.
        Ditto for event system interface.
      9f4edbba
  5. 28 May, 2004 1 commit
  6. 26 Apr, 2004 1 commit
    • Leigh B. Stoller's avatar
      Changes to exit status stuff to reflect recent changes by Rob to how · 1c4a613c
      Leigh B. Stoller authored
      assign exits (exit codes).
      
      * in assign_wrapper, no longer return any status from assign to the
        caller. This was pointless. Instead, return 0 on success, 1 on
        controlled error, and -1 on uncontrolled error (die() called
        someplace). Add in CANRECOVER bit whenever the wrapper exits, even
        if uncontrolled, by putting in an END block to catch the die. This
        should prevent certain cases where a swapmod error would be flagged
        as not recoverable.
      
      * Remove most of the assign output processing since we no longer
        return its codes. Still print a portion of it to the log though.
      
      * Change call to fatal() in assign_wrapper; do not pass an exitcode
        since in every case it was the same damn thing!
      
      * Change tbswap to no longer carry assign_wrapper exit code to its
        exit.
      
      * Change the batch daemon to treat all errors as continuable (keep
        batch queued) unless exit code is -1. We will need to revisit this a
        bit perhaps, when Rob adds precheck code.
      1c4a613c
  7. 16 Feb, 2004 1 commit
  8. 10 Feb, 2004 1 commit
  9. 09 Jan, 2004 1 commit
    • Shashi Guruprasad's avatar
      Changes to do auto re-swap of expts with simnodes when an nse on a simhost · 375f87c1
      Shashi Guruprasad authored
      (or more than one simhost) is unable to keep up with real-time. It includes
      changes to assign_wrapper to handle swap modify for simnodes, the simple
      algorithm in nseswap that bumps up the nodeweight of simnodes being hosted
      on a simhost that reports "can't keep up with real-time" (aka nse violation),
      ptopgen and sim.tcl to prefer nodes that already have the FBSD-NSE image.
      Also, changes to other files to send out NSESWAP event.
      
      One unrelated change: We now have per-swapin .top files and assign.log
      files along with .ptop files. This helps in debugging across multiple
      swapins since files remain in the form of
      <pid>-<eid>-<process_id>.{top,ptop} and assign-<pid>-<eid>-<process_id>.log
      Also useful for archiving.
      375f87c1
  10. 08 Jan, 2004 1 commit
  11. 13 Dec, 2003 1 commit
  12. 01 Dec, 2003 1 commit
    • Robert Ricci's avatar
      New scripts: tarfiles_setup, fetchtar.proxy, and webtarfiles_setup . · c0c6547c
      Robert Ricci authored
      The idea is to give us hooks for grabbing experimenters' tarballs (and
      RPMs) from locations other than files on ops. Mainly, to remove
      another dependance on users having shells on ops.
      
      tarfiles_setup supports fetching files from http and ftp URLs right
      now, through wget. It places them into the experiment directory, so
      that they'll go away when the experiment is terminated, and the rest
      of the chain (ie. downloading to clients and os_setup's checks)
      remains unchaged.  It is now tarfiles_setup's job to copy tarballs and
      RPMs from the virt_nodes table to the nodes table for allocated nodes.
      This way, it can translate URLs into the local filenames it
      constructs. It get invoked from tbswap.
      
      Does the actual fetching over on ops, running as the user, with
      fetchtar.proxy.
      
      Should be idempotent, so we should be able to give the user a button
      to run webtarfiles_setup (none exists yet) yet to 'freshen' their
      tarballs. (We'd also have to somehow let the experiment's nodes know
      they need to re-fetch their tarballs.)
      
      One funny side effect of this is that the separator in
      virt_nodes.tarfiles is now ';' instead of ':' like nodes.tarballs,
      since we can now put URLs in the former. Making these consistent is a
      project for another day.
      c0c6547c
  13. 17 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  14. 30 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Up to now we have had two state variables associated with an experiment, · 4269dad1
      Leigh B. Stoller authored
      plus a lock field. The lock field was a simple "experiment locked, go away"
      slot that is easy to use when you do not care about the actual state that
      an experiment is in, just that it is in "transition" and should not be
      messed with.
      
      The other two state variables are "state" and "batchstate". The former
      (state) is the original variable that Chris added, and was used by the tb*
      scripts to make sure that the experiment was in the state each particular
      script wanted them to be in. But over time (and with the addition of so
      much wrapper goo around them), "state" has leaked out all over the place to
      determine what operations on an experiment are allowed, and if/when it
      should be displayed in various web pages. There are a set of transition
      states in addition to the usual "active", "swapped", etc like "swapping"
      that make testing state a pain in the butt.
      
      I added the other state variable ("batchstate") when I did the batch
      system, obviously! It was intended as a wrapper state to control access to
      the batch queue, and to prevent batch experiments from being messed with
      except when it was really okay (for example, its okay to terminate a
      swapped out batch experiment, but not a swapped in batch experiment since
      that would confuse the batch daemon). There are fewer of these states, plus
      one additional state for "modifying" experiments.
      
      So what I have done is change the system to use "batchstate" for all
      experiments to control entry into the swap system, from the web interface,
      from the command line, and from the batch daemon. The other state variable
      still exists, and will be brutally pushed back under the surface until its
      just a vague memory, used only by the original tb* scripts. This will
      happen over time, and the "batchstate" variable will be renamed once I am
      convinced that this was the right thing to do and that my changes actually
      work as intended.
      
      Only people who have bothered to read this far will know that I also added
      the ability to cancel experiment swapin in progress. For that I am using
      the "canceled" flag (ah, this one was named properly from the start!), and
      I test that at various times in assign_wrapper and tbswap. A minor downside
      right now is that a canceled swapin looks too much like a failed swapin,
      and so tbops gets email about it. I'll fix that at some point (sometime
      after the boss complains).
      
      I also cleaned up various bits of code, replacing direct calls to exec
      with calls to the recently improved SUEXEC interface. This removes
      some cruft from each script that calls an external script.
      
      Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting.
      Also fixed to not run the parser directly! This was very wrong; should
      call nscheck instead. Changed to use "nobody" group instead of group
      flux (made the same change in nscheck).
      
      There is a script in the sql directory called newstates.pl. It needs
      to be run to initialize the batchstate slot of the experiments table
      for all existing experiments.
      4269dad1
  15. 18 Sep, 2003 2 commits
  16. 27 Aug, 2003 1 commit
  17. 22 Aug, 2003 1 commit
  18. 06 Aug, 2003 1 commit
    • Leigh B. Stoller's avatar
      Clean up temporary files used in modify. The temp dirs were being · 05bd80ff
      Leigh B. Stoller authored
      created in /tmp and left behind. I've moved them to the expwork
      directory instead, and added a routine in the library to clear them
      out.
      
      Clear out the nsfile (stored in /tmp) used in modify. The web page was
      creating a temp file, but never removing it. swapexp now copies the
      nsfile in so that the web page can remove the temporary after the
      script exits. The temp is placed in the expwork directory as well, but
      left behind for debugging.
      
      When swapmod fails, send along the nsfile in the email message.
      05bd80ff
  19. 05 Aug, 2003 1 commit
  20. 30 Jul, 2003 1 commit
  21. 29 Jul, 2003 1 commit
  22. 25 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      Commit my version of assign_wrapper as assign_wrapper-new, and change · 62e38deb
      Leigh B. Stoller authored
      tbswap to use this version inside the testbed project only! All other
      projects will see the old version for now; there are just too many
      things to test, and the testsuite gets just a fraction of them. Some
      highlights (which I will expand on later when I commit this version to
      the main version):
      
      * New -t option to create the TOP file, and then exit. The only other
        side effect of this is to update the min/max nodes for the
        experiment in the DB, unles new option -n (impotent mode) is given.
      
      * New -n option to operate in impotent mode; do not allocate nodes and
        do not modify the DB. Okay, so this option is not as great as it
        sounds. I eventually hit the point of diminishing returns, with
        trying to make things work right without DB modification. At some
        point I just throw in the towel and exit. This currently happens after
        interpolating the link results of assign. But, I have found it very
        useful, and could get better with time. Being able to run assign on
        the main DB without sucking up the nodes is nice for debugging.
      
      * Lots of data structure organization, mostly on the virtual topology
        side of assign (you can think of assign as two sections, the part
        that interprets the DB tables and creates the TOP file, and the part
        that reads the results of assign and sets up all the physical stuff
        in the DB). I removed numerous global hashes, and combined them into
        aggregate data structures, such as they are in Perl. My approach for
        this was to read the tables from the DB, and keep them handy,
        extending them as needed with stuff that assign_wrapper generates as
        it proceeds. This has the side effect of cutting down on the number
        of queries as well.
      
        The next task is to do the physical side reorg, but not up for that
        yet.
      62e38deb
  23. 17 Jul, 2003 1 commit
  24. 15 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      A set of changes to make swapmod work on jailed nodes (note, swapmod · 92ff875a
      Leigh B. Stoller authored
      does not yet work with remove virtual nodes; that will take even more
      work).
      
      Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer
      deallocates unused nodes, but rather moves them into the new state for
      the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to
      be torn down, since its possible that the node on which they were
      living will not be deallocated (say, if there are other vnodes on
      it). We do not want to be doing that from assign_wrapper, so tbswap
      looks for those nodes.
      
      Made vnode_setup allocstate aware in the same way that os_setup is;
      do not reboot vnodes or try to set up vnodes when they are already in
      the RES_READY state, as they will be when doing a swapmod. In
      addition, if os_setup is going to reboot the underlying physnode, move
      the vnodes on that node into RES_READY too, since there they will
      setup automatically. Might need an interim state here, for correctness.
      92ff875a
  25. 11 Jun, 2003 1 commit
  26. 29 Apr, 2003 4 commits
    • Chad Barb's avatar
      · 40bc8389
      Chad Barb authored
      (hopefully) fixed bug in startexp
      affecting batch system by switching two cleanup checks.
      (I wasn't giving it an opportunity to bail on a batchexp failing
       soon enough.)
      
      Also, tidied up return code for tbswap.
      40bc8389
    • Chad Barb's avatar
      · 59d5e48f
      Chad Barb authored
      Don't reboot nodes when recovering from a failed assign during update.
      59d5e48f
    • Chad Barb's avatar
      · a3a818af
      Chad Barb authored
      Mask off '64' bit from return from assign_wrapper, so
      batch_exp doesn't choke on it.
      a3a818af
    • Chad Barb's avatar
      Robust Experiment Modify -and- · 7308f458
      Chad Barb authored
      Various Other changes to get Expt Modify ready for prime time.
      
       - If assign fails on a modify, experiment will
         be restored to old state, *not* swapped out.
      
       - Reboot option has been improved to reboot all
         nodes as part of os_setup, not in separate
         step.
      
       - Different assign error codes result in different
         retry behavior for assign_wrapper
         (Follow's Rob's change to assign to make it
          pass back special code for non-retriable faults)
      
       - '64' bit in assign_wrapper exit code indicates to tbswap
         that db/phys state hadn't been mucked with before
         the exit occurred
         (ergo, '65' and '1' are the common return codes,
          though the old 4,8,16,32 are still there for assign failing.)
      
       - (tbswap still returns codes from assign wrapper)
      
       - Added 5 sec pause between assign attempts.
      
       - Cleaned up tbswap code.
      
       - Physical state backup/restore removed from tbprerun,
         put into swapexp.
      
       - Interfaces table now getting cleaned up correctly
         (Mike noticed problem)
      
       - Changed menu display in showexp to show
         the "modify" menu option for swapped out experiments
         (like it used to.)
      
       - A couple other changes.
      
      Note:
       Still admin-only, but I plan to change that soon.
      
      To do:
       - Erase expt backups in /tmp after using them.
       - Re-viz failed experiments.
      7308f458
  27. 07 Apr, 2003 1 commit
    • Chad Barb's avatar
      · 99a867cb
      Chad Barb authored
      Modify os_setup return codes to enable "intelligent" retry;
      
      Now os_setup returns:
        0 on success
        1 on one or more retry-friendly errors
       -1 on no-retry errors
      
      tbswap.in checks os_setup's return code,
      and will only retry on 1.
      99a867cb
  28. 03 Apr, 2003 1 commit
    • Chad Barb's avatar
      · 765de560
      Chad Barb authored
      Added new feature 'Experiment Modify'.
      Now available (to admins only for now) from the showexp page.
      
      Warning! doing a modify which alters the topology will probably
      require a "reboot all nodes" afterwards.
      (There will be a checkbox soon in the modify experiment page.)
      
      Adding/removing delay nodes seems to work fine without reboots, though.
      
      Warning! If the new version of the experiment cannot be mapped
       (not enough nodes available, for instance) the experiment will be
       swapped out! This will get fixed later.
      
      Prerun backs up the experiment topology, so using a bad NS
      file doesn't result in experiment termination.
      
      As part of this, added library functions to libdb to
      delete, backup, and restore both virtual and physical experiment state.
      765de560
  29. 02 Apr, 2003 1 commit
  30. 27 Mar, 2003 1 commit
    • Chad Barb's avatar
      · a495bcfb
      Chad Barb authored
      New tbswap mode 'update'. (a.k.a. 'reswap')
      Re-assigns experiment, fixing already assigned nodes in place;
      tries not to reboot nodes. Doesn't clear port counters,
      restart event system, etc.
      
      A few more things remain to be considered for 'general' use
      adding new nodes to experiments and modifying topologies,
      but for replacing failed nodes in experiments
      or removing virt_nodes from experiments, should work fine.
      a495bcfb
  31. 24 Mar, 2003 1 commit
    • Chad Barb's avatar
      · dc12ae50
      Chad Barb authored
      Fixed bug where after running out of retry attempts,
      swapout was only doing the 'retry' version of cleanup,
      when it should have been doing a full cleanup.
      dc12ae50
  32. 20 Mar, 2003 2 commits
    • Chad Barb's avatar
      · 6a449d22
      Chad Barb authored
      tbswap: re-enabled retry (Undid leigh's last change)
      
      assign_wrapper.in:
         was left-joining reserved to nodes to get reserved list;
         This didn't get delays.
      
         now am doing separate query on reserved, and putting those
         into the %fixed and %alreadyAllocated hashes.
      6a449d22
    • Leigh B. Stoller's avatar
      c731d5e0
  33. 18 Mar, 2003 1 commit
    • Chad Barb's avatar
      · e928fbe9
      Chad Barb authored
      Here it is; reswap.
      
      nfree
         - modified to put node in FREE_DIRTY when it is freed
      
      assign_wrapper
         - '-u' update switch added.
      
      os_setup
         - doesn't reboot node which is already in RES_READY
      
      tbswap
         - calls all this stuff appropriately
      e928fbe9
  34. 07 Mar, 2003 2 commits
    • Chad Barb's avatar
      · 5de15b7c
      Chad Barb authored
      Still not fully tested, but seems to work.
      
      Fixed minor error.
      (When $TESTMODE == 1,
       swapped in experiment was incorrectly being put into ACTIVE
       after it was put into TESTING.)
      5de15b7c
    • Chad Barb's avatar
      · 2fd95aee
      Chad Barb authored
      NOT TESTED; NOT READY FOR PRIME TIME.
      Archived here for backup/review purposes.
      
      Initial version; unifies tbswapout and tbswapin into one
      script.
      
      Should be just like tbswapin/out except:
      tbswapin  foo bar => tbswap in  foo bar
      tbswapout foo bar => tbswap out foo bar
      
      The main win here is that doSwapin() is a function, as is doSwapout().
      If doSwapin() fails, actual doSwapout() code can be called.
      
      Also includes retry framework
      (functionalizing doSwapin and doSwapout makes retrying much cleaner.)
      2fd95aee