1. 23 May, 2006 1 commit
  2. 05 May, 2006 1 commit
  3. 28 Mar, 2006 1 commit
  4. 24 Mar, 2006 1 commit
    • Kevin Atkinson's avatar
      · bcbd18aa
      Kevin Atkinson authored
      Hhave swapexp/batchexp dump the error when the -w" option is
      specified.  The error will look something like:
      
        ERROR:: <cause desc>
      
        <text of the error>
      
        Cause: <cause>
        Confidence: <confidence>
      
      This will be the last thing printed.  The "::" is there to make
      recognizing the error easy to scripts since they can just look for the
      "ERROR::".
      bcbd18aa
  5. 21 Mar, 2006 1 commit
    • Kevin Atkinson's avatar
      · d258dde6
      Kevin Atkinson authored
      Changed format of email sent to user on errors.  The error will now
      appear instead of the generic message when I am confident it is
      accurate.  The subject line will also change to reflect the cause of
      an error.
      
      Avoid sending mail to testbed-ops during failed swap related evenets
      in some cases.  It will instead be sent to a new mailing list
      testbed-errors.
      
      Added a new row in the experiment info table "Last Error:" which
      states the cause of the error, and links to a new page displaying the
      error.
      
      Made some assign/assign_wrapper errors more informative.
      
      The error (as determined by tblog) is now stored in the database in a
      more structured fashion.  This inlcudes adding a column for the session
      (in the log table) to testbed_stats to link eash swap event with the
      logs and possible the error.
      
      Other changes to the database, see sql/database-migrate.txt
      d258dde6
  6. 16 Feb, 2006 1 commit
  7. 15 Feb, 2006 1 commit
    • David Johnson's avatar
      * Makeconf.in, configure, configure.in, defs-default, defs-johnsond-emulab: · 4982b9cd
      David Johnson authored
          - added a new defs var, TBROBOCOPSEMAIL
      
        * tbsetup/power_mail.pm.in:
          - add some new info to robot powerup mails
      
        * db/libdb.pm.in:
          - add a new function to determine if an experiment contains nodes of a
            given class/type
      
        * tbsetup/swapexp.in:
          - check if exp is a robot exp; that is, if it has robots or motes; if
            so, cc error msgs to TBROBOCOPSEMAIL in addition to TBOPS
      4982b9cd
  8. 09 Feb, 2006 1 commit
  9. 23 Jan, 2006 1 commit
  10. 28 Dec, 2005 1 commit
    • Leigh B. Stoller's avatar
      A fair number of changes. · 564d958d
      Leigh B. Stoller authored
      * The rest of the backend support for simplistic experiment duplication,
        either from an existing (current) experiment, or from a specific
        archive revision of a current or terminated experiment.
      
      * Rework the swapmod code so that the archive is committed at the exact
        end of the swapout phase. This required adding (and moving) some code
        from swapexp to tbswap sine that is where the actual swapout/swapin
        happens during a swapmod
      
      * Add a special directory called "archive" to the experiment
        directory, which is a place where users can store stuff they want
        saved away. This will eventually be a user defined set of
        directories, but this was good for getting the basic mechanism in
        place. Note that the when the contents of this directory are copied
        out for placement into the archive, it is an exact copy made with
        rsync.
      
      * No longer "clean" the contents of the temporary store between
        commits of the archive. This was creating a lot of headaches, and
        was also causing the revision history to get messed up. The downside
        of this is that we have to be more careful to explicitly delete
        files that the user no longer uses. I have not solved all these
        issues yet, so in the meantime files will get left in the archive
        even if the user no longer references them.
      564d958d
  11. 19 Dec, 2005 1 commit
    • Kevin Atkinson's avatar
      · 45f997fd
      Kevin Atkinson authored
      Updates to to Error Logging API Code.
      
      You should start seeing much better error messages coming from my
      system.  Errors coming from parse.proxy and assign (the two most
      frequent sources of errors) should now be concise and to the point.
      Errors coming from libosload/libreboot (the next most frequent source
      of errors) should now also be much better, but not perfect.  Getting
      perfect errors will likely a rework of how errors are handled in
      libosload/libreboot, just adding tberror/tbwarn/tbnotice calls is not
      enough.  I can do this at a latter date if necessary.
      
      A few minor database changes.
      
      Some changes to the API.  A few bug fixes. Lots of tberror/tbwarn/tbnotice
      added to scripts.
      
      Since assign is a C program, and at this time my API is perl only, I wrote a
      second wrapper around assign, assign_wrapper2.  When assign fails errors are
      now parsed in assign_wrapper2, sent to stderr and logged.  This means that
      RunAssign() just returns when assign fails rather than echoing some of
      assign.log output and then quiting.  The output to the activity log remains
      unchanged.
      
      Since "parse.proxy" is run from ops I couldn't use my API in it, even though
      it is a perl program.  Instead I parse the errors coming form it in
      parse-ns.
      45f997fd
  12. 15 Dec, 2005 2 commits
  13. 12 Dec, 2005 1 commit
    • Leigh B. Stoller's avatar
      Fix a bug that Mike reported wrt the swap uid being set too late, · deaae248
      Leigh B. Stoller authored
      causing the program agent to run as last swapper.
      
      This was a little tricky cause of a poor decision to share the
      usage of the last_swap_uid in the stats gathering code, which wants
      the to set the last swapper late so that the previous swapper gets
      charged appropriately. Rather then mess with the stats code too much,
      I moved things around a bit, setting the swapper earlier and adding
      code in libdb to capture the original swapper at the begining of a
      swapmod for accounting, and then adding code in swapexp to reset the
      swapper if a swapmod fails.
      
      Should fix the stats code at some point to have its own idea of
      swapper.
      deaae248
  14. 04 Nov, 2005 1 commit
    • Kevin Atkinson's avatar
      · a2aba279
      Kevin Atkinson authored
      Added error logging API.  See tbsetup/libtblog.pm.in and tbsetup/libtblog.sql.
      a2aba279
  15. 19 Oct, 2005 1 commit
  16. 19 Jul, 2005 1 commit
  17. 13 Jul, 2005 1 commit
  18. 31 May, 2005 1 commit
  19. 20 May, 2005 1 commit
    • Leigh B. Stoller's avatar
      Change to stats gathering code. Generate a new experiment_resources · c5b79546
      Leigh B. Stoller authored
      record for each swapin (previously, it was just at swapmod). The
      reason for this is that as the testbed gets more fragmented in terms
      of hardware, it is less and less likely that consecutive swapins of
      the same experiment will use the same number of physical resources.
      
      We end up with some duplication of data inside the table, but no big
      deal. I suspect we will revisit this per experiment state as the
      workbench stuff proceeds.
      c5b79546
  20. 12 May, 2005 2 commits
    • Leigh B. Stoller's avatar
      Minot change to last revision; let anyone swapmod a swapped firewalled · 1adb2069
      Leigh B. Stoller authored
      or elabinelab experiment, but continue to allow only admins to do it
      if the experiment is active. Just while I continue to debug.
      1adb2069
    • Leigh B. Stoller's avatar
      Checkpoint the rest of my changes to support swapmod of both ElabInElab and · 6eff9de6
      Leigh B. Stoller authored
      Firewalled experiments (see tbsetup/elabinelab.in for the other stuff).
      
      * To support firewalled experiments, needed to add a new virt_firewalls
        table to split the existing firewalls table up, which included both
        virtual and physical stuff. There are the usual frontend changes and a
        few other things scattered around, including tmcd.c.
      
      * The firewall code in tbswap got some beefing up to support adding and
        deleting nodes from the its special control net vlan. Note that I have
        not made any progress on containment of deleted nodes, just as we do not
        do anything now for teardown (unless its paniced, in which case the
        experiment cannot be modified anyway).
      
      * ptopgen and assign_wrapper got some interesting modifications: Unlike
        regular swapmod, we cannot just tear down all the vlans since that would
        interrupt everything inside the inner elab. Instead, leave the vlans as
        is. The problem is that when assign runs, it can just as easily pick
        different interfaces on the same nodes, which would be a royal pain in
        the ass to deal with! So, ptopgen got a new option (-u) that assign
        wrapper uses to tell ptopgen that it should prune out unused interfaces
        from nodes that are already allocated to the experiment. This is, at
        best, as pathetically gross hack, but it makes sure that all the
        interfaces stay the same across swapmods.
      
      * The unrelated revision of elabinelab has a bunch of new code for adding
        and deleting nodes from the inner elab. Mostly it deals with dhcpd (inner
        and outer, waiting for nodes to reboot, etc). It also deals with updating
        the vlans table in the DB, pruning out any nodes (ports) that are deleted
        but for which there are still interfaces in existing vlans. Said ports
        are them moved back to the default vlan with calls to snmpit. Also under
        another revision a a couple of weeks ago are the web interface changes to
        support the newnode MFS inside an inner Emulab.
      
      * swapexp and endexp got some more checks for firewalled and paniced
        experiments, which were missing.
      6eff9de6
  21. 03 May, 2005 1 commit
  22. 27 Apr, 2005 1 commit
  23. 12 Jan, 2005 1 commit
  24. 16 Dec, 2004 1 commit
    • Leigh B. Stoller's avatar
      The panic button ... · 87dd2e60
      Leigh B. Stoller authored
      * tbsetup/panic.in: New backend script to implement the panic button
        feature. When used, it will cut the severe the connection to the
        firewall node by using snmpit to disable the port. Sets the panic
        bit (and date) in the experiments table, and changes the state of
        the experiment from "active" to "paniced" to ensure that the
        experiment cannot be messed with (swapped out or modified). Sends
        email to tbops when the panic button is pressed.
      
        Used with -r option, reverses the above. State is set back to
        active, the panic bit is cleared, and the port is renabled with
        snmpit.
      
      * tbsetup/tbswap.in: During swapout, a firewalled experiment that has
        been paniced will get a cleaning; The nodes are powered off, then
        the osids for all the nodes are reset (with os_select) so that they
        will boot the MFS, and then the nodes are powered on. Then the
        control network is turned back on, and then I wait for the nodes to
        reboot (this is simply cause we do not record in the DB that a node
        is turned off, and if I do not wait, the reload daemon will end
        hitting the power button again if they do not reboot in time. We can
        fix this later.
      
        I am not planning to apply this to general firewalled experiments
        yet as the power cycling is going to be hard on the nodes, so would
        rather that we at least have a 1/2 baked plan before we do that.
      
      * www/showexp.php3: If experiment is firewalled, show the Panic
        Button, linked to the panic button web script. If the experiment has
        already had the panic button pressed, show a big warning message and
        explain that user must talk to tbops to swap the experiment out.
        Also fiddle with menu options so that the terminate link is gone,
        and the swap link is visible only in admin mode. In other words, only
        an admin person can swap an experiment once it is paniced. And of
        course, an admin person can the backend panic script above with the
        -r option, but thats not something to be done lightly.
      
      * db/libdb.pm.in: Add "paniced" as an experiment state (EXPTSTATE_PANICED).
        Add utility functions: TBExptSetPanicBit(), TBExptGetPanicBit(), and
        TBExptClearPanicBit().
      
      * tbsetup/swapexp.in: Minor state fiddling so that an experiment can
        be swapped while in paniced state, but only when in admin mode. Also
        clear the panic bit when experiment is swapped out.
      
      * www/dbdefs.php3.in: Add "paniced" as an experiment state. Add a
        utility function TBExptFirewall() to see if experiment is firewalled.
      
      * www/panicbutton.php3: New web script to invoke the backend panic
        script mentioned above, after the usual confirm song and dance.
      
      * www/panicbutton.gif: New gif of a red panic button that I stole off
        the net. If anyone has sees/has a better one, feel free to replace
        this one.
      
      * utils/node_statewait.in: Add -s option so that I can pass in the
        state I want to wait for (used from tbswap above to wait for nodes
        to reach ISUP after power on).
      87dd2e60
  25. 15 Nov, 2004 1 commit
    • Leigh B. Stoller's avatar
      A bunch of ElabInELab changes. · 10b116e0
      Leigh B. Stoller authored
      * snmpit: When ElabInELabis true, use the routines in the new
        snmpit_remote.pm library for setting up and tearing down vlans for an
        experiment. At present, only these two operations are proxied out to
        the outer emulab.
      
      * snmpit_remote.pm: A new little library that uses the XMLPRC server on
        the outer emulab to setup and destroy vlans for an inner experiment.
        This code is used from snmpit (see above).
      
      * snmpit_lib.pm: A couple of minor changes for the server side of the
        proxy operation.
      
      * snmpit.proxy.in: A new perl module that is invoked from the RPC
        server.  This proxy sets up and tears down vlans for an inner elab.
        The basic model is that the container experiment will have lots of
        vlans for various individual experiments running on the inner emulab.
      
      * swapexp: A couple of minor elabinelab hacks.
      
      * tbswap: For elabinelab experiments, reconfig/restart dhcpd when
        tearing down the experiment, and call out to new elabinelab script
        when setting up an elabinelab experiment. There is no provision for
        swapmod at this time.
      
      * elabinelab: A new script to create the inner emulab. Does all kinds of
        gross DB stuff then more gross stuff on the inner ops and boss.
      10b116e0
  26. 30 Aug, 2004 1 commit
  27. 29 Jul, 2004 1 commit
    • Leigh B. Stoller's avatar
      Two unrelated bug fixes (with some related cleanups and tweaks) · 9f4edbba
      Leigh B. Stoller authored
      * The first involves swapmod. When a swapmod on an active experiment fails,
        tbswap will reswap the experiment back to the original configuration. The
        problem is that it is reswapping it with the *new* virtual state of the
        experiment in the DB. It is not until later when control returns to
        swapexp that the virtual state is restored. This is plainly wrong, and in
        fact was causing the event scheduler grief cause it was starting up,
        reading the the virtual topo, which was different, wrong, and about to be
        blown away.
      
        I reorganized the modify section of swapexp so that virtual state is
        restored only when its a swapmod on a swapped experiment. On an active
        experiment, I moved that code down into tbswap, which will now does all
        of the virtual and physical state retore before it does the reswap back
        to the original experiment. Just for kicks, its also done if tbswap
        decides to swap the experiment cause of a fatal error.
      
        Cleanups: I changed $NoRecover to $CanRecover. My feeble brain cannot
        deal with !$NoRecover. I know, two knots make a wright for most people.
      
        Renderer: I was annoyed by the fact that we rerun the renderer on a
        failed swapmod. The original reason is that the renderer runs in the
        background and so vis_nodes cannot be saved with the rest of the virtual
        state tables cause the renderer might still be running when the user
        fires off the swapmod. Well, the hell with that. We lock the vis_nodes
        table anyway in the renderer during update, so we are certain to get a
        consistent snapshot. We store the renderer pid in the experiments table,
        so if the renderer was running, just fire off another one; mostly this is
        not going to happen. In addition, tbprerun no longer starts a new
        renderer when doing the swapmod; I start the new renderer later after
        swapmod succeeds. I might end up tweaking this a bit depending on what
        people notice as being different.
      
      * Termination changes to batchexp and swapexp: I've rearranged the
        termination code using an END block so that any uncontrolled exit from
        either batchexp or swapexp will go through the cleanup code, and
        hopefully insert a stats record, as well as not leave the experiment in
        some inbetween state. I've set the max DB retry count to zero in both
        cases, which means infinite retry. I've also added SIGTERM handlers to
        both so that again, we can kill a hung batch/swap and have it clean up
        things more or less. Note that END blocks are not caught when a signal
        causes the program to die; you have to catch it and then die() so that
        the END block is executed.
      
        Eventually, we need to clean up the various libraries so that we do not
        use DBQueryFatal(), but rather use DBQueryWarn(), and look for failure.
        Ditto for event system interface.
      9f4edbba
  28. 26 Jul, 2004 1 commit
    • Leigh B. Stoller's avatar
      Okay, lets clear up some confusion when swapmod fails and 1) the · fdac8b89
      Leigh B. Stoller authored
      experiment is swapped or 2) the experiment is completely terminated.
      In these case, lets put explicit swapout/destroy events into
      testbed_stats so that the record is not confused by experiments that
      appear to start when they are still running. This really throws off
      the summary stats web page!
      fdac8b89
  29. 15 Jul, 2004 1 commit
  30. 29 Jun, 2004 1 commit
  31. 17 May, 2004 1 commit
  32. 13 May, 2004 2 commits
  33. 29 Apr, 2004 1 commit
    • Leigh B. Stoller's avatar
      Add prelim support for using linktest. Because of problems, this is · 6cdccbd2
      Leigh B. Stoller authored
      currently available to only people with stud=1 status in the DB.
      
      * www/tbauth.php3: Add a STUDLY() function to check that bit.
      
      * www/linktest.php3: New page to run linktest on the fly. The level
        defaults to the current level in the experiments table, but you can
        override that via the form on the page.
      
      * www/showexp.php3: Add link to aforementioned page. STUDLY() only.
      
      * www/beginexp_form.php3: Add an option (selection) to set the linktest
        level for create/swapin. Defaults to 0 (no linktest). STUDLY() only.
      
      * www/editexp.php3: Add an option to edit the default linktest level
        for an experiment. STUDLY() only.
      
      * tbsetup/batchexp.in and tbsetup/swapexp.in: Add code to optionally run
        the linktest, sending email if it fails (exists with non-zero status).
        Failure does not affect the swapin.
      6cdccbd2
  34. 07 Apr, 2004 1 commit
  35. 15 Mar, 2004 1 commit
  36. 12 Feb, 2004 1 commit
    • Leigh B. Stoller's avatar
      * Removed startexp, and merged its contents into batchexp. There has been · aef08532
      Leigh B. Stoller authored
        no reason for the separation for a long time, and it made maintence more
        difficult cause of duplication between batchexp and startexp (batch was
        the sole user of startexp). Cleaner solution.
      
      * Check argument processing for batchexp, swapexp, endexp to make sure the
        taint checks are correct. All three of these scripts will now be
        available from ops. I especially watch the filename processing, which was
        pretty loose before and could allow some to grab a file on boss by trying
        to use it as an NS file (scripts all runs as user of course). The web
        interface generates filenames that are hard to guess, so rather then
        wrapping these scripts when invoked from ops, just allow the usual paths
        (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which
        should be hard enough to guess that users will not be able to get
        anything they are not supposed to.
      
      * Add -w (waitmode) options to all three scripts. In waitmode, the backend
        detaches, but the parent remains waiting for the child to finish so it
        can exit with the appropriate status (for scripting). The user can
        interrupt (^C), but it has no effect on the backend; it just kills the
        parent side that is waiting (backend is in a new session ID). Log outout
        still goes to the file (available from web page) and is emailed.
      aef08532
  37. 05 Feb, 2004 1 commit